[jira] [Created] (ARROW-7583) [C++][Flight] Auth handler tests fragile on Windows

2020-01-15 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7583:
-

 Summary: [C++][Flight] Auth handler tests fragile on Windows
 Key: ARROW-7583
 URL: https://issues.apache.org/jira/browse/ARROW-7583
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Reporter: Antoine Pitrou


This occurs often on AppVeyor:
{code}
[--] 3 tests from TestAuthHandler
[ RUN  ] TestAuthHandler.PassAuthenticatedCalls
[   OK ] TestAuthHandler.PassAuthenticatedCalls (4 ms)
[ RUN  ] TestAuthHandler.FailUnauthenticatedCalls
..\src\arrow\flight\flight_test.cc(1126): error: Value of: status.message()
Expected: has substring "Invalid token"
  Actual: "Could not write record batch to stream: "
[  FAILED  ] TestAuthHandler.FailUnauthenticatedCalls (3 ms)
[ RUN  ] TestAuthHandler.CheckPeerIdentity
[   OK ] TestAuthHandler.CheckPeerIdentity (2 ms)
[--] 3 tests from TestAuthHandler (10 ms total)
[--] 3 tests from TestBasicAuthHandler
[ RUN  ] TestBasicAuthHandler.PassAuthenticatedCalls
[   OK ] TestBasicAuthHandler.PassAuthenticatedCalls (4 ms)
[ RUN  ] TestBasicAuthHandler.FailUnauthenticatedCalls
..\src\arrow\flight\flight_test.cc(1224): error: Value of: status.message()
Expected: has substring "Invalid token"
  Actual: "Could not write record batch to stream: "
[  FAILED  ] TestBasicAuthHandler.FailUnauthenticatedCalls (4 ms)
[ RUN  ] TestBasicAuthHandler.CheckPeerIdentity
[   OK ] TestBasicAuthHandler.CheckPeerIdentity (3 ms)
[--] 3 tests from TestBasicAuthHandler (11 ms total)
{code}

See e.g. 
https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/30110376/job/vbtd22813g5hlgfl#L2252



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7584) [Python] Improve ergonomics of new FileSystem API

2020-01-15 Thread Jira
Fabian Höring created ARROW-7584:


 Summary: [Python] Improve ergonomics of new FileSystem API
 Key: ARROW-7584
 URL: https://issues.apache.org/jira/browse/ARROW-7584
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Fabian Höring


The [new Python FileSystem API 
|https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185] is 
nice but seems to be very verbose to use.

The documentation of the old FS API is 
[here|https://arrow.apache.org/docs/python/filesystems.html]

Here are some examples:

*File access:*

Before:
fs.ls()
fs.mkdir()
fs.rmdir()

Now:
fs.get_target_stats()
fs.create_dir()
fs.delete_dir()

What is the advantage of having a longer method ? The short ones seems clear 
and are much easier to use. Seems like an easy change.  Also this is consistent 
with what is doing hdfs in the [fs api| 
https://arrow.apache.org/docs/python/filesystems.html] and works naturally with 
a local filesystem.

*File opening:*

Before:
with fs.open(self, path, mode=u'rb', buffer_size=None)

Now:
fs.open_input_file()
fs.open_input_stream()
fs.open_output_stream()

It seems more natural to fit to Python standard open function which works for 
local file access as well. Not sure if this is possible to doeasily. as there 
is `_wrap_output_stream`

Solutions:
- If the current Python API is still unused we could just rename the methods
- We could everything as is and add some alias methods, it would make the 
FileSystem class a bit messy think if there are always 2 method to do the work

Other considerations:
In the long run I would also enhance the FileSystem API to add more methods 
that use the basic to provide new features for example:
- introduce put and get on top of the streams that directly upload/download 
files
- be able to write string to the file streams (instead of only bytes), it would 
permit to directly use some Python API's like json.dump

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  json.dump(res, fd)
```

instead of

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  fd.write(json.dumps(res)) # instead of fd.write(json.dumps(res).encode())
```

or like currently (with old API, untested with new one)

```
with fs.open(path, "wb") as fd:
  res = {"a": "bc"}
  fd.write(json.dumps(res).encode())
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0

2020-01-15 Thread Crossbow


Arrow Build Report for Job nightly-2020-01-15-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0

Failed Tasks:
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-spark-master
- wheel-manylinux2014-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp35m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-dask-master
- test-conda-p

[C++] Arrow added to OSS-Fuzz

2020-01-15 Thread Antoine Pitrou


Hello,

I would like to announce that Arrow has been accepted on the OSS-Fuzz
infrastructure (a continuous fuzzing infrastructure operated by Google):
https://github.com/google/oss-fuzz/pull/3233

Right now the only fuzz targets are the C++ stream and file IPC readers.
The first build results haven't appeared yet.  They will appear on
https://oss-fuzz.com/ .   Access needs a Google account, and you also
need to be listed in the "auto_ccs" here:
https://github.com/google/oss-fuzz/blob/master/projects/arrow/project.yaml

(if you are a PMC or core developer and want to be listed, just open a
PR to the oss-fuzz repository)

Once we confirm the first builds succeed on OSS-Fuzz, we should probably
add more fuzz targets (for example for reading Parquet files).

Regards

Antoine.


[jira] [Created] (ARROW-7585) Plasma-store-server does not support --help, shows backtrace on getopt error

2020-01-15 Thread Christian Hudon (Jira)
Christian Hudon created ARROW-7585:
--

 Summary: Plasma-store-server does not support --help, shows 
backtrace on getopt error
 Key: ARROW-7585
 URL: https://issues.apache.org/jira/browse/ARROW-7585
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Plasma
Reporter: Christian Hudon


I'm trying out Plasma, using plasma-store-server. The first thing I usually do 
then is to run the binary without arguments, and that usually gives me a 
message showing usage. However, with plasma-store-server, the initial 
experience there is a backtrace:
{noformat}
$ ./debug/plasma-store-server
/Users/chrish/Code/arrow/cpp/src/plasma/store.cc:1237: please specify socket 
for incoming connections with -s switch
0   plasma-store-server 0x00010b4d7c04 
_ZN5arrow4util7CerrLog14PrintBackTraceEv + 52
1   plasma-store-server 0x00010b4d7b24 
_ZN5arrow4util7CerrLogD2Ev + 100
2   plasma-store-server 0x00010b4d7a85 
_ZN5arrow4util7CerrLogD1Ev + 21
3   plasma-store-server 0x00010b4d7aa9 
_ZN5arrow4util7CerrLogD0Ev + 25
4   plasma-store-server 0x00010b4d7990 
_ZN5arrow4util8ArrowLogD2Ev + 80
5   plasma-store-server 0x00010b4d79c5 
_ZN5arrow4util8ArrowLogD1Ev + 21
6   plasma-store-server 0x00010b463152 main + 1122
7   libdyld.dylib   0x7fff7765a3d5 start + 1
fish: './debug/plasma-store-server' terminated by signal SIGABRT (Abort)
{noformat}
Also, neither of the "h" or "help" command-line switches is supported, and so 
to start plasma-store-server, you either find the doc, or iteratively add 
arguments until you stop getting "please specify ..." backtraces.

I know it's not a big thing, but it'd be nice if that initial experience was a 
little bit more user-friendly. Also submitting this because it feels like a 
good first time issue, so I would be very happy to do the work, and would like 
to tackle it. I'd like to 1) add --help support that shows all the options and 
gives an example with the required ones, and 2) remove the unnecessary 
backtraces on normal errors like these in the main() function.

Just asking beforehand here: 1) would this kind of patch be welcome, and 2) is 
there a C++ library for command-line option parsing that I could be using. I 
can find one on my own, but I'd rather ask here which one would be approved for 
using in the Arrow codebase... or should I just stick to getopt() and do things 
manually? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7586) [C++][Dataset] Read feather files

2020-01-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7586:
--

 Summary: [C++][Dataset] Read feather files
 Key: ARROW-7586
 URL: https://issues.apache.org/jira/browse/ARROW-7586
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


0.16 release preparation

2020-01-15 Thread Neal Richardson
Hi all,
As a reminder, we're trying to close out work on 0.16 this week so that we
can release next week. As you can see on
https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.16.0+Release,
we're making good progress. That said, there are still a decent number of
open/in-progress issues tagged for 0.16, probably more than can
realistically land this week.

I'd encourage everyone to have a look at that board and do a few things:

* For tickets that you're "assigned" to (i.e. you're working on them), if
you're not expecting to merge them this week, please push the fixVersion
back to 1.0.0 so they're out of scope. If they land in time, great, but
let's move the nonessential tasks out of the way.
* For tickets that you reported and haven't been picked up, please review
whether you think you can do them this week (or know someone who can), and
bump to 1.0 those that aren't going to happen.
* Otherwise review the areas of the project that you work on and make sure
things are release-ready (including documentation)

Also, as I mentioned yesterday, I've started a draft release announcement
here: https://github.com/apache/arrow-site/pull/41. Please fill in where
you can.

Looking forward to a smooth release. Thanks to everyone for pitching in.

Neal


Re: [C++] Arrow added to OSS-Fuzz

2020-01-15 Thread Fan Liya
Hi Antoine,

Good job! And thanks for sharing the great news!

Best,
Liya Fan

On Thu, Jan 16, 2020 at 2:59 AM Antoine Pitrou  wrote:

>
> Hello,
>
> I would like to announce that Arrow has been accepted on the OSS-Fuzz
> infrastructure (a continuous fuzzing infrastructure operated by Google):
> https://github.com/google/oss-fuzz/pull/3233
>
> Right now the only fuzz targets are the C++ stream and file IPC readers.
> The first build results haven't appeared yet.  They will appear on
> https://oss-fuzz.com/ .   Access needs a Google account, and you also
> need to be listed in the "auto_ccs" here:
> https://github.com/google/oss-fuzz/blob/master/projects/arrow/project.yaml
>
> (if you are a PMC or core developer and want to be listed, just open a
> PR to the oss-fuzz repository)
>
> Once we confirm the first builds succeed on OSS-Fuzz, we should probably
> add more fuzz targets (for example for reading Parquet files).
>
> Regards
>
> Antoine.
>


[jira] [Created] (ARROW-7587) [C++][Compute] Add Top-k kernel

2020-01-15 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-7587:
---

 Summary: [C++][Compute] Add Top-k kernel
 Key: ARROW-7587
 URL: https://issues.apache.org/jira/browse/ARROW-7587
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Compute
Reporter: Yibo Cai
Assignee: Yibo Cai


Add a kernel to get top k smallest or largest elements (indices).
std::paiital_sort should be a better solution than sorting everything then pick 
top k.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7588) [Plasma] Plasma On YARN

2020-01-15 Thread Ferdinand Xu (Jira)
Ferdinand Xu created ARROW-7588:
---

 Summary: [Plasma] Plasma On YARN
 Key: ARROW-7588
 URL: https://issues.apache.org/jira/browse/ARROW-7588
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Plasma
Reporter: Ferdinand Xu


YARN is widely used for resource manager. Currently Plasma server serves as an 
external service for memory sharing across different clients. It is not a 
managed service by YARN. The resource used by Plasma should also been managed. 
Additionally, Plasma service can also been managed by YARN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7589) [C++][Gandiva] Calling castVarchar java sometimes results in segmentation fault for input length 0

2020-01-15 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7589:
-

 Summary: [C++][Gandiva] Calling castVarchar java sometimes results 
in segmentation fault for input length 0
 Key: ARROW-7589
 URL: https://issues.apache.org/jira/browse/ARROW-7589
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Projjal Chanda
Assignee: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)