date:20200327

[jira] [Created] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader

2020-03-27 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8250:
---

 Summary: [C++] Add "random access" / slice read API to 
RecordBatchFileReader
 Key: ARROW-8250
 URL: https://issues.apache.org/jira/browse/ARROW-8250
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


If you want to read a small section of a file, it is not possible to easily 
determine the relevant record batches that need "rehydrating".

I would propose the following:

* A way to cheaply read (and cache, so this doesn't have to be done multiple 
times) all the RecordBatch metadata without deserializing the record batch data 
structures themselves
* Based on the metadata you can then determine the range of batches that need 
to be rehydrated and then sliced accordingly to produce the Table of interest

This functionality can be lifted into the Feather read APIs also



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[VOTE] Accept "DoExchange" RPC to Arrow Flight protocol

2020-03-27 Thread Wes McKinney

Hello,

David M Li has proposed adding a "bidirectional" DoExchange RPC [1] to
the Arrow Flight Protocol [2]. In this client call, datasets (possibly
having different schemas) are sent by both the
client and server in a single transaction. This can be used to offload
computational tasks and other workloads not currently well-supported
by the Flight protocol.

Please vote whether to accept the addition. The vote will be open for
at least 72 hours (since it's Friday, it'll be open for a good deal
longer than 72 hours).

[ ] +1 Accept this addition to the Flight protocol
[ ] +0
[ ] -1 Do not accept the changes because...

Here is my vote: +1

Thanks,
Wes

[1]: https://github.com/apache/arrow/pull/6686

[jira] [Created] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent

2020-03-27 Thread Andy Grove (Jira)

Andy Grove created ARROW-8249:
-

 Summary: [Rust] [DataFusion] Make Table and LogicalPlanBuilder 
APIs more consistent
 Key: ARROW-8249
 URL: https://issues.apache.org/jira/browse/ARROW-8249
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


We now have two similar APIs with Table and LogicalPlanBuilder and although 
they are similar, there are some differences and it would be good to unify 
them. There is also code duplication and it most likely makes sense for the 
Table API to delegate to the query builder API to build logical plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Bryan Cutler

+1 for using black

On Fri, Mar 27, 2020 at 11:53 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Fri, 27 Mar 2020 at 18:49, Antoine Pitrou  wrote:
>
> >
> > I don't want to be the small minority opposing this so let's go for it.
> > One question though: will we continue to check Cython files using
> > flake8?
> >
>
> Yes, and I think we can continue to check flake8 for python files as well.
> At
> least that is what we do in eg pandas. There are a few things that flake8
> checks that Black doesn't fix automatically. For example comments that are
> too long are not reformatted by black, so it's good to keep flake8 working
> for that.
>
> Joris
>
>
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Thu, 26 Mar 2020 20:37:01 +0100
> > Joris Van den Bossche  wrote:
> > > Hi all,
> > >
> > > I would like to propose adopting Black as code formatter within the
> > python
> > > project. There is an older JIRA issue about this (
> > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to
> > the
> > > mailing list for wider attention.
> > >
> > > Black (https://github.com/ambv/black) is a tool for automatically
> > > formatting python code in ways which flake8 and our other linters
> approve
> > > of (and fill a similar role to clang-format for C++ and cmake-format
> for
> > > cmake). It can also be added to the linting checks on CI and to the
> > > pre-commit hooks like we now run flake8.
> > > Using it ensures python code will be formatted consistently, and more
> > > importantly automates this formatting, letting you focus on more
> > important
> > > matters.
> > >
> > > Black makes some specific formatting choices, and not everybody (me
> > > included) will always like those choices (that's how it goes with
> > something
> > > subjective like formatting). But my experience with using it in some
> > other
> > > big python projects (pandas, dask) has been very positive. You very
> > quickly
> > > get used to how it looks, while it is much nicer to not have to worry
> > about
> > > formatting anymore.
> > >
> > > Best,
> > > Joris
> > >
> >
> >
> >
> >
>

[jira] [Created] (ARROW-8248) vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)

2020-03-27 Thread Scott Wilson (Jira)

Scott Wilson created ARROW-8248:
---

 Summary: vcpkg build clobbers arrow.lib from shared (.dll) with 
static (.lib)
 Key: ARROW-8248
 URL: https://issues.apache.org/jira/browse/ARROW-8248
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Developer Tools
Affects Versions: 0.16.0
Reporter: Scott Wilson


After installing Arrow via vcpkg, build the library per the steps below. CMake 
builds the shared arrow library (.dll) and then the static arrow library 
(.lib). It overwrites the shared arrow.lib (exports) with the static arrow.lib. 
This results in multiple link/execution problems when using the vc projects to 
build the example projects until you realize that shared arrow needs to be 
rebuilt. (This took me two days.) 

Also, many of the projects added with the extra -D flags (beyond 
ARROW_BUILD_TESTS) don't build.

***

"C:\Program Files (x86)\Microsoft Visual 
Studio\2017\Professional\Common7\Tools\VsDevCmd.bat" -arch=amd64

cd F:\Dev\vcpkg\buildtrees\arrow\src\row-0.16.0-872c330822\cpp

mkdir build

cd build

cmake .. -G "Visual Studio 15 2017 Win64" -DARROW_BUILD_TESTS=ON 
-DARROW_BUILD_EXAMPLES=ON -DARROW_PARQUET=ON -DARROW_PYTHON=ON 
-DCMAKE_BUILD_TYPE=Debug

cmake --build . --config Debug



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Joris Van den Bossche

On Fri, 27 Mar 2020 at 18:49, Antoine Pitrou  wrote:

>
> I don't want to be the small minority opposing this so let's go for it.
> One question though: will we continue to check Cython files using
> flake8?
>

Yes, and I think we can continue to check flake8 for python files as well.
At
least that is what we do in eg pandas. There are a few things that flake8
checks that Black doesn't fix automatically. For example comments that are
too long are not reformatted by black, so it's good to keep flake8 working
for that.

Joris


>
> Regards
>
> Antoine.
>
>
> On Thu, 26 Mar 2020 20:37:01 +0100
> Joris Van den Bossche  wrote:
> > Hi all,
> >
> > I would like to propose adopting Black as code formatter within the
> python
> > project. There is an older JIRA issue about this (
> > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to
> the
> > mailing list for wider attention.
> >
> > Black (https://github.com/ambv/black) is a tool for automatically
> > formatting python code in ways which flake8 and our other linters approve
> > of (and fill a similar role to clang-format for C++ and cmake-format for
> > cmake). It can also be added to the linting checks on CI and to the
> > pre-commit hooks like we now run flake8.
> > Using it ensures python code will be formatted consistently, and more
> > importantly automates this formatting, letting you focus on more
> important
> > matters.
> >
> > Black makes some specific formatting choices, and not everybody (me
> > included) will always like those choices (that's how it goes with
> something
> > subjective like formatting). But my experience with using it in some
> other
> > big python projects (pandas, dask) has been very positive. You very
> quickly
> > get used to how it looks, while it is much nicer to not have to worry
> about
> > formatting anymore.
> >
> > Best,
> > Joris
> >
>
>
>
>

[jira] [Created] (ARROW-8247) [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table

2020-03-27 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8247:
---

 Summary: [Python] Expose Parquet writing "engine" setting in 
pyarrow.parquet.write_table
 Key: ARROW-8247
 URL: https://issues.apache.org/jira/browse/ARROW-8247
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.17.0


This is a follow up to ARROW-7741 so we have a path to the old Parquet writer 
logic in the event that bugs are reported and we need to give users a workaround



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors

2020-03-27 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8246:
---

 Summary: [C++] Add -Wa,-mbig-obj when compiling with MinGW to 
avoid linking errors
 Key: ARROW-8246
 URL: https://issues.apache.org/jira/browse/ARROW-8246
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.17.0


See 
https://digitalkarabela.com/mingw-w64-how-to-fix-file-too-big-too-many-sections/

This seems to be the MinGW equivalent of {{/bigobj}} in MSVC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Antoine Pitrou



I don't want to be the small minority opposing this so let's go for it.
One question though: will we continue to check Cython files using
flake8?

Regards

Antoine.


On Thu, 26 Mar 2020 20:37:01 +0100
Joris Van den Bossche  wrote:
> Hi all,
> 
> I would like to propose adopting Black as code formatter within the python
> project. There is an older JIRA issue about this (
> https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to the
> mailing list for wider attention.
> 
> Black (https://github.com/ambv/black) is a tool for automatically
> formatting python code in ways which flake8 and our other linters approve
> of (and fill a similar role to clang-format for C++ and cmake-format for
> cmake). It can also be added to the linting checks on CI and to the
> pre-commit hooks like we now run flake8.
> Using it ensures python code will be formatted consistently, and more
> importantly automates this formatting, letting you focus on more important
> matters.
> 
> Black makes some specific formatting choices, and not everybody (me
> included) will always like those choices (that's how it goes with something
> subjective like formatting). But my experience with using it in some other
> big python projects (pandas, dask) has been very positive. You very quickly
> get used to how it looks, while it is much nicer to not have to worry about
> formatting anymore.
> 
> Best,
> Joris
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2020-03-27 Thread Wes McKinney

Looks like there is consensus about this. I'll start a vote about the
format change soon if no further comments.

On Mon, Mar 23, 2020 at 7:41 AM David Li  wrote:
>
> Hey Wes,
>
> Thanks for the review. I've broken out the format change into this PR:
> https://github.com/apache/arrow/pull/6686
>
> Best,
> David
>
> On 3/22/20, Wes McKinney  wrote:
> > hi David,
> >
> > I did a preliminary view and things look to be on the right track
> > there. What do you think about breaking out the protocol changes (and
> > adding appropriate comments) so we can have a vote on that in
> > relatively short order?
> >
> > - Wes
> >
> > On Wed, Mar 18, 2020 at 9:06 AM David Li  wrote:
> >>
> >> Following up here, I've submitted a draft implementation for C++:
> >> https://github.com/apache/arrow/pull/6656
> >>
> >> The core functionality is there, but there are still holes that I need
> >> to implement. Compared to the draft spec, the client also sends a
> >> FlightDescriptor to begin with, though it's currently not exposed.
> >> This provides consistency with DoGet/DoPut which also send a message
> >> to begin with to describe the stream to the server.
> >>
> >> Andy, I hope this helps clarify whether it meets your needs.
> >>
> >> Best,
> >> David
> >>
> >> On 2/25/20, David Li  wrote:
> >> > Hey Andy,
> >> >
> >> > I've been rather busy unfortunately. I had started on an
> >> > implementation in C++ to provide as part of this discussion, but it's
> >> > not complete. I'm hoping to have more done in March.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 2/25/20, Andy Grove  wrote:
> >> >> I was wondering if there had been any momentum on this (the
> >> >> BiDirectional
> >> >> RPC design)?
> >> >>
> >> >> I'm interested in this for the use case of Apache Spark sending a
> >> >> stream
> >> >> of
> >> >> data to another process to invoke custom code and then receive a
> >> >> stream
> >> >> back with the transformed data.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Andy.
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau 
> >> >> wrote:
> >> >>
> >> >>> I support moving forward with the current proposal.
> >> >>>
> >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li 
> >> >>> wrote:
> >> >>>
> >> >>> > Just following up here again, any other thoughts?
> >> >>> >
> >> >>> > I think we do have justifications for potentially separate streams
> >> >>> > in
> >> >>> > a call, but that's more of an orthogonal question - it doesn't need
> >> >>> > to
> >> >>> > be addressed here. I do agree that it very much complicates things.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > David
> >> >>> >
> >> >>> > On 11/29/19, Wes McKinney  wrote:
> >> >>> > > I would generally agree with this. Note that you have the
> >> >>> > > possibility
> >> >>> > > to use unions-of-structs to send record batches with different
> >> >>> > > schemas
> >> >>> > > in the same stream, though with some added complexity on each
> >> >>> > > side
> >> >>> > >
> >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau
> >> >>> > > 
> >> >>> > wrote:
> >> >>> > >>
> >> >>> > >> I'd vote for explicitly not supported. We should keep our
> >> >>> > >> primitives
> >> >>> > >> narrow.
> >> >>> > >>
> >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li 
> >> >>> > >> wrote:
> >> >>> > >>
> >> >>> > >> > Thanks for the feedback.
> >> >>> > >> >
> >> >>> > >> > I do think if we had explicitly embraced gRPC from the
> >> >>> > >> > beginning,
> >> >>> > >> > there are a lot of places where things could be made more
> >> >>> > >> > ergonomic,
> >> >>> > >> > including with the metadata fields. But it would also have
> >> >>> > >> > locked
> >> >>> out
> >> >>> > >> > us of potential future transports.
> >> >>> > >> >
> >> >>> > >> > On another note: I hesitate to put too much into this method,
> >> >>> > >> > but
> >> >>> > >> > we
> >> >>> > >> > are looking at use cases where potentially, a client may want
> >> >>> > >> > to
> >> >>> > >> > upload multiple distinct datasets (with differing schemas).
> >> >>> > >> > (This
> >> >>> is a
> >> >>> > >> > little tentative, and I can get more details...) Right now,
> >> >>> > >> > each
> >> >>> > >> > logical stream in Flight must have a single, consistent
> >> >>> > >> > schema;
> >> >>> would
> >> >>> > >> > it make sense to look at ways to relax this, or declare this
> >> >>> > >> > explicitly out of scope (and require multiple calls and
> >> >>> > >> > coordination
> >> >>> > >> > with the deployment topology) in order to accomplish this?
> >> >>> > >> >
> >> >>> > >> > Best,
> >> >>> > >> > David
> >> >>> > >> >
> >> >>> > >> > On 11/27/19, Jacques Nadeau  wrote:
> >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the
> >> >>> > >> > > proposal
> >> >>> looks
> >> >>> > >> > > good
> >> >>> > >> > > to me.
> >> >>> > >> > >
> >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li
> >> >>> > >> > > 
> >> >>> > >> > > wrote:
> >> >>> > >> > >
> >> >>> > >> > >> I've updated the

[jira] [Created] (ARROW-8245) [Python] Skip hidden directories when reading partitioned parquet files

2020-03-27 Thread Caleb Overman (Jira)

Caleb Overman created ARROW-8245:


 Summary: [Python] Skip hidden directories when reading partitioned 
parquet files
 Key: ARROW-8245
 URL: https://issues.apache.org/jira/browse/ARROW-8245
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Caleb Overman


When writing a partitioned parquet file Spark can create a temporary hidden 
`.spark-staging` directory within the parquet file. Because it is a directory 
and not a file, it is not skipped when trying to read the parquet file. Pyarrow 
currently only skips directories prefixed with `_`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8244) Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-27 Thread Rick Zamora (Jira)

Rick Zamora created ARROW-8244:
--

 Summary: Add `write_to_dataset` option to populate the "file_path" 
metadata fields
 Key: ARROW-8244
 URL: https://issues.apache.org/jira/browse/ARROW-8244
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Rick Zamora


Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been 
using the `write_to_dataset` API to write partitioned parquet datasets.  This 
PR is switching to a (hopefully temporary) custom solution, because that API 
makes it difficult to populate the the "file_path"  column-chunk metadata 
fields that are returned within the optional `metadata_collector` kwarg.  Dask 
needs to set these fields correctly in order to generate a proper global 
`"_metadata"` file.

Possible solutions to this problem:
 # Optionally populate the file-path fields within `write_to_dataset`
 # Always populate the file-path fields within `write_to_dataset`
 # Return the file paths for the data written within `write_to_dataset` (up to 
the user to manually populate the file-path fields)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder

2020-03-27 Thread Andy Grove (Jira)

Andy Grove created ARROW-8243:
-

 Summary: [Rust] [DataFusion] Fix inconsistent API in 
LogicalPlanBuilder
 Key: ARROW-8243
 URL: https://issues.apache.org/jira/browse/ARROW-8243
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove


LogicalPlanBuilder project method takes a  whereas other methods take a 
Vec. It makes sense to take Vec and take ownership of these inputs since they 
are being used to build the plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8242) [C++] GCC 4.8 fails to compile Flight

2020-03-27 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-8242:
--

 Summary: [C++] GCC 4.8 fails to compile Flight
 Key: ARROW-8242
 URL: https://issues.apache.org/jira/browse/ARROW-8242
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


See recent build log 
https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8944=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=5b4cc83a-7bb0-5664-5bb1-588f7e4dc05b=2186



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8241) Add convenience methods to Schema

2020-03-27 Thread Andy Grove (Jira)

Andy Grove created ARROW-8241:
-

 Summary: Add convenience methods to Schema
 Key: ARROW-8241
 URL: https://issues.apache.org/jira/browse/ARROW-8241
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


I would like to add the following methods to Schema to make it easier to work 
with.

 
{code:java}
pub fn field_with_name(, name: ) -> Result<>;

pub fn index_of(, name: ) -> Result;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)

2020-03-27 Thread Yaqub Alwan (Jira)

Yaqub Alwan created ARROW-8240:
--

 Summary: [Python] New FS interface (pyarrow.fs) does not seem to 
work correctly for HDFS (Python 3.6, pyarrow 0.16.0)
 Key: ARROW-8240
 URL: https://issues.apache.org/jira/browse/ARROW-8240
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Yaqub Alwan


I'll preface this with the limited setup I had to do:


{{export CLASSPATH=$(hadoop classpath --glob)}}

{{export 
ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib64}}

 
Then I ran the following:

{{code}}
In [1]: import pyarrow.fs   

  

In [2]: c = pyarrow.fs.HadoopFileSystem()   

  

In [3]: sel = pyarrow.fs.FileSelector('/user/rwiumli')  

  

In [4]: c.get_target_stats(sel) 

  
---
OSError   Traceback (most recent call last)
 in 
> 1 c.get_target_stats(sel)

~/tmp/venv/lib/python3.6/site-packages/pyarrow/_fs.pyx in 
pyarrow._fs.FileSystem.get_target_stats()

~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~/tmp/venv/lib/python3.6/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

OSError: HDFS list directory failed, errno: 2 (No such file or directory)

In [5]: sel = pyarrow.fs.FileSelector('.')  

  

In [6]: c.get_target_stats(sel) 

  
Out[6]: 
[,
 ,
 ]

In [7]: !ls 

  
sample.py  sandeep  venv

In [8]:   
{{code}}

It looks like the new hadoop fs interface is doing a local lookup?

Ok fine...

{{code}}
In [8]: sel = pyarrow.fs.FileSelector('hdfs:///user/rwiumli') # shouldnt have 
to do this  


In [9]: c.get_target_stats(sel) 

  
hdfsGetPathInfo(hdfs:///user/rwiumli): getFileInfo error:
IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: 
file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, 
expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:593)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:811)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:588)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:432)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1418)
hdfsListDirectory(hdfs:///user/rwiumli): FileSystem#listStatus error:
IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, expected: 
file:///java.lang.IllegalArgumentException: Wrong FS: hdfs:/user/rwiumli, 
expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:662)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:410)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1566)
at

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-27-0

2020-03-27 Thread Crossbow



Arrow Build Report for Job nightly-2020-03-27-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0

Failed Tasks:
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-conda-win-vs2015-py38
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-gandiva-jar-trusty
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-cpp-valgrind
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-debian-ruby:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-debian-ruby
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-ubuntu-18.04-docs
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp35m
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp37m
- wheel-manylinux1-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-azure-wheel-manylinux1-cp38

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-centos-8
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-github-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-gandiva-jar-osx
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-27-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL:

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Krisztián Szűcs

+1 from me too, hopefully cython support will land eventually.

On Fri, Mar 27, 2020 at 11:33 AM Rok Mihevc  wrote:
>
> +1 for black
>
> On Fri, Mar 27, 2020 at 11:11 AM Uwe L. Korn  wrote:
>
> > I'm also very much in favor of this.
> >
> > For the black / cython support, I think the current state is reflected in
> > https://github.com/pablogsal/black/tree/cython.
> >
> > On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote:
> > > +1 from me as well.
> > >
> > > On Thursday, March 26, 2020, Neal Richardson <
> > neal.p.richard...@gmail.com>
> > > wrote:
> > >
> > > > I'm also in favor, very much so. Life is too short to hold strong
> > opinions
> > > > about code style; you get used to whatever you're accustomed to
> > seeing. And
> > > > I support using automation to remove manual nuisances like this.
> > > >
> > > > Neal
> > > >
> > > > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney 
> > wrote:
> > > >
> > > > > I'm in favor of this even though I also probably won't like some of
> > > > > the formatting decisions it makes. Is there a sense of how far away
> > > > > Black is from having Cython support? I saw it was being worked on a
> > > > > while back.
> > > > >
> > > > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche
> > > > >  wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to propose adopting Black as code formatter within the
> > > > > python
> > > > > > project. There is an older JIRA issue about this (
> > > > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing
> > it to
> > > > > the
> > > > > > mailing list for wider attention.
> > > > > >
> > > > > > Black (https://github.com/ambv/black) is a tool for automatically
> > > > > > formatting python code in ways which flake8 and our other linters
> > > > approve
> > > > > > of (and fill a similar role to clang-format for C++ and
> > cmake-format
> > > > for
> > > > > > cmake). It can also be added to the linting checks on CI and to the
> > > > > > pre-commit hooks like we now run flake8.
> > > > > > Using it ensures python code will be formatted consistently, and
> > more
> > > > > > importantly automates this formatting, letting you focus on more
> > > > > important
> > > > > > matters.
> > > > > >
> > > > > > Black makes some specific formatting choices, and not everybody (me
> > > > > > included) will always like those choices (that's how it goes with
> > > > > something
> > > > > > subjective like formatting). But my experience with using it in
> > some
> > > > > other
> > > > > > big python projects (pandas, dask) has been very positive. You very
> > > > > quickly
> > > > > > get used to how it looks, while it is much nicer to not have to
> > worry
> > > > > about
> > > > > > formatting anymore.
> > > > > >
> > > > > > Best,
> > > > > > Joris
> > > > >
> > > >
> > >
> >

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Rok Mihevc

+1 for black

On Fri, Mar 27, 2020 at 11:11 AM Uwe L. Korn  wrote:

> I'm also very much in favor of this.
>
> For the black / cython support, I think the current state is reflected in
> https://github.com/pablogsal/black/tree/cython.
>
> On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote:
> > +1 from me as well.
> >
> > On Thursday, March 26, 2020, Neal Richardson <
> neal.p.richard...@gmail.com>
> > wrote:
> >
> > > I'm also in favor, very much so. Life is too short to hold strong
> opinions
> > > about code style; you get used to whatever you're accustomed to
> seeing. And
> > > I support using automation to remove manual nuisances like this.
> > >
> > > Neal
> > >
> > > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney 
> wrote:
> > >
> > > > I'm in favor of this even though I also probably won't like some of
> > > > the formatting decisions it makes. Is there a sense of how far away
> > > > Black is from having Cython support? I saw it was being worked on a
> > > > while back.
> > > >
> > > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I would like to propose adopting Black as code formatter within the
> > > > python
> > > > > project. There is an older JIRA issue about this (
> > > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing
> it to
> > > > the
> > > > > mailing list for wider attention.
> > > > >
> > > > > Black (https://github.com/ambv/black) is a tool for automatically
> > > > > formatting python code in ways which flake8 and our other linters
> > > approve
> > > > > of (and fill a similar role to clang-format for C++ and
> cmake-format
> > > for
> > > > > cmake). It can also be added to the linting checks on CI and to the
> > > > > pre-commit hooks like we now run flake8.
> > > > > Using it ensures python code will be formatted consistently, and
> more
> > > > > importantly automates this formatting, letting you focus on more
> > > > important
> > > > > matters.
> > > > >
> > > > > Black makes some specific formatting choices, and not everybody (me
> > > > > included) will always like those choices (that's how it goes with
> > > > something
> > > > > subjective like formatting). But my experience with using it in
> some
> > > > other
> > > > > big python projects (pandas, dask) has been very positive. You very
> > > > quickly
> > > > > get used to how it looks, while it is much nicer to not have to
> worry
> > > > about
> > > > > formatting anymore.
> > > > >
> > > > > Best,
> > > > > Joris
> > > >
> > >
> >
>

Re: Proposal to use Black for automatic formatting of Python code

2020-03-27 Thread Uwe L. Korn

I'm also very much in favor of this.

For the black / cython support, I think the current state is reflected in 
https://github.com/pablogsal/black/tree/cython.

On Fri, Mar 27, 2020, at 4:40 AM, Micah Kornfield wrote:
> +1 from me as well.
> 
> On Thursday, March 26, 2020, Neal Richardson 
> wrote:
> 
> > I'm also in favor, very much so. Life is too short to hold strong opinions
> > about code style; you get used to whatever you're accustomed to seeing. And
> > I support using automation to remove manual nuisances like this.
> >
> > Neal
> >
> > On Thu, Mar 26, 2020 at 3:49 PM Wes McKinney  wrote:
> >
> > > I'm in favor of this even though I also probably won't like some of
> > > the formatting decisions it makes. Is there a sense of how far away
> > > Black is from having Cython support? I saw it was being worked on a
> > > while back.
> > >
> > > On Thu, Mar 26, 2020 at 2:37 PM Joris Van den Bossche
> > >  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to propose adopting Black as code formatter within the
> > > python
> > > > project. There is an older JIRA issue about this (
> > > > https://issues.apache.org/jira/browse/ARROW-5176), but bringing it to
> > > the
> > > > mailing list for wider attention.
> > > >
> > > > Black (https://github.com/ambv/black) is a tool for automatically
> > > > formatting python code in ways which flake8 and our other linters
> > approve
> > > > of (and fill a similar role to clang-format for C++ and cmake-format
> > for
> > > > cmake). It can also be added to the linting checks on CI and to the
> > > > pre-commit hooks like we now run flake8.
> > > > Using it ensures python code will be formatted consistently, and more
> > > > importantly automates this formatting, letting you focus on more
> > > important
> > > > matters.
> > > >
> > > > Black makes some specific formatting choices, and not everybody (me
> > > > included) will always like those choices (that's how it goes with
> > > something
> > > > subjective like formatting). But my experience with using it in some
> > > other
> > > > big python projects (pandas, dask) has been very positive. You very
> > > quickly
> > > > get used to how it looks, while it is much nicer to not have to worry
> > > about
> > > > formatting anymore.
> > > >
> > > > Best,
> > > > Joris
> > >
> >
>

[jira] [Created] (ARROW-8239) [Java] fix param checks in splitAndTransfer method

2020-03-27 Thread Prudhvi Porandla (Jira)

Prudhvi Porandla created ARROW-8239:
---

 Summary: [Java] fix param checks in splitAndTransfer method
 Key: ARROW-8239
 URL: https://issues.apache.org/jira/browse/ARROW-8239
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Prudhvi Porandla
Assignee: Prudhvi Porandla






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015

2020-03-27 Thread Yibo Cai (Jira)

Yibo Cai created ARROW-8238:
---

 Summary: [C++][Compute] Failed to build compute tests on windows 
with msvc2015
 Key: ARROW-8238
 URL: https://issues.apache.org/jira/browse/ARROW-8238
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Compute
Reporter: Yibo Cai


Build Arrow compute tests on Windows10 with MSVC2015:
{code:bash}
cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DARROW_COMPUTE=ON 
-DARROW_BUILD_TESTS=ON ..

ninja -j3
{code}

Build failed with below message:
{code:bash}
[311/405] Linking CXX executable release\arrow-misc-test.exe
FAILED: release/arrow-misc-test.exe
cmd.exe /C "cd . && 
C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\bin\cmake.exe -E 
vs_link_exe --intdir=src\arrow\CMakeFiles\arrow-misc-test.dir 
--rc=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\rc.exe 
--mt=C:\PROGRA~2\WI3CF2~1\8.1\bin\x64\mt.exe --manifests  -- 
C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo 
src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj  
/out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib 
/pdb:release\arrow-misc-test.pdb /version:0.0  /machine:x64  
/NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /subsystem:console  
release\arrow_testing.lib  release\arrow.lib  
googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib  
googletest_ep-prefix\src\googletest_ep\lib\gtest.lib  
googletest_ep-prefix\src\googletest_ep\lib\gmock.lib  
C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib  
C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib  
Ws2_32.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib 
ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK: command "C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\link.exe /nologo 
src\arrow\CMakeFiles\arrow-misc-test.dir\memory_pool_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\result_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\pretty_print_test.cc.obj 
src\arrow\CMakeFiles\arrow-misc-test.dir\status_test.cc.obj 
/out:release\arrow-misc-test.exe /implib:release\arrow-misc-test.lib 
/pdb:release\arrow-misc-test.pdb /version:0.0 /machine:x64 /NODEFAULTLIB:LIBCMT 
/INCREMENTAL:NO /subsystem:console release\arrow_testing.lib release\arrow.lib 
googletest_ep-prefix\src\googletest_ep\lib\gtest_main.lib 
googletest_ep-prefix\src\googletest_ep\lib\gtest.lib 
googletest_ep-prefix\src\googletest_ep\lib\gmock.lib 
C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_filesystem.lib 
C:\Users\yibcai01\Miniconda3\envs\arrow-dev\Library\lib\boost_system.lib 
Ws2_32.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib 
oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST 
/MANIFESTFILE:release\arrow-misc-test.exe.manifest" failed (exit code 1169) 
with the following output:
arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl 
std::vector >::vector >(class std::initializer_list,class 
std::allocator const &)" 
(??0?$vector@HV?$allocator@H@std@@@std@@QEAA@V?$initializer_list@H@1@AEBV?$allocator@H@1@@Z)
 already defined in result_test.cc.obj
arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: __cdecl 
std::vector >::~vector >(void)" (??1?$vector@HV?$allocator@H@std@@@std@@QEAA@XZ) 
already defined in result_test.cc.obj
arrow_testing.lib(arrow_testing.dll) : error LNK2005: "public: unsigned __int64 
__cdecl std::vector >::size(void)const " 
(?size@?$vector@HV?$allocator@H@std@@@std@@QEBA_KXZ) already defined in 
result_test.cc.obj
release\arrow-misc-test.exe : fatal error LNK1169: one or more multiply defined 
symbols found
[313/405] Building CXX object 
src\arrow\CMakeFiles\arrow-table-test.dir\table_builder_test.cc.obj
ninja: build stopped: subcommand failed.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader

[VOTE] Accept "DoExchange" RPC to Arrow Flight protocol

[jira] [Created] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent

Re: Proposal to use Black for automatic formatting of Python code

[jira] [Created] (ARROW-8248) vcpkg build clobbers arrow.lib from shared (.dll) with static (.lib)

Re: Proposal to use Black for automatic formatting of Python code

[jira] [Created] (ARROW-8247) [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table

[jira] [Created] (ARROW-8246) [C++] Add -Wa,-mbig-obj when compiling with MinGW to avoid linking errors

Re: Proposal to use Black for automatic formatting of Python code

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

[jira] [Created] (ARROW-8245) [Python] Skip hidden directories when reading partitioned parquet files

[jira] [Created] (ARROW-8244) Add `write_to_dataset` option to populate the "file_path" metadata fields

[jira] [Created] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder

[jira] [Created] (ARROW-8242) [C++] GCC 4.8 fails to compile Flight

[jira] [Created] (ARROW-8241) Add convenience methods to Schema

[jira] [Created] (ARROW-8240) [Python] New FS interface (pyarrow.fs) does not seem to work correctly for HDFS (Python 3.6, pyarrow 0.16.0)

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-27-0

Re: Proposal to use Black for automatic formatting of Python code

Re: Proposal to use Black for automatic formatting of Python code

Re: Proposal to use Black for automatic formatting of Python code

[jira] [Created] (ARROW-8239) [Java] fix param checks in splitAndTransfer method

[jira] [Created] (ARROW-8238) [C++][Compute] Failed to build compute tests on windows with msvc2015

22 matches

Site Navigation

Mail list logo

Footer information