[jira] [Created] (ARROW-9076) [Rust] Async CSV reader

2020-06-08 Thread Sergey Todyshev (Jira)
Sergey Todyshev created ARROW-9076:
--

 Summary: [Rust] Async CSV reader
 Key: ARROW-9076
 URL: https://issues.apache.org/jira/browse/ARROW-9076
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Sergey Todyshev


rust-csv crate recently adds async implementation for CSV reader. It would be 
nice to have it in arrow crate as well. It is extremely useful in an 
application that needs to parse large CSV files in WebAssembly.

It would be nice to have async JSON reader as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9075) [C++] Optimize Filter implementation

2020-06-08 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9075:
---

 Summary: [C++] Optimize Filter implementation
 Key: ARROW-9075
 URL: https://issues.apache.org/jira/browse/ARROW-9075
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 1.0.0


I split this off from ARROW-5760 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


float 16

2020-06-08 Thread Pierre Belzile
Hi,

There seems to be two competing standards for floats with 16 bits:

   - https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
   - IEEE: https://en.wikipedia.org/wiki/IEEE_754-2008_revision

Was there any thought on how this could be handled? Would it make sense to
add some kind of DataType attribute to the HALF_FLOAT?

Cheers, Pierre


[jira] [Created] (ARROW-9074) [GLib] Add missing arrow-json check

2020-06-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-9074:
---

 Summary: [GLib] Add missing arrow-json check
 Key: ARROW-9074
 URL: https://issues.apache.org/jira/browse/ARROW-9074
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9073) [C++] RapidJSON include directory detection doesn't work with RapidJSONConfig.cmake

2020-06-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-9073:
---

 Summary: [C++] RapidJSON include directory detection doesn't work 
with RapidJSONConfig.cmake
 Key: ARROW-9073
 URL: https://issues.apache.org/jira/browse/ARROW-9073
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9072) [C++][Gandiva][MinGW] Enable crashed tests

2020-06-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-9072:
---

 Summary: [C++][Gandiva][MinGW] Enable crashed tests
 Key: ARROW-9072
 URL: https://issues.apache.org/jira/browse/ARROW-9072
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Kouhei Sutou


Some Gandiva tests are crashed with MinGW. They are disabled in 
{{ci/scripts/cpp_test.sh}}.

We should fix the problems of the crashes and enable these tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Wes McKinney
I'm openly not very sympathetic toward people who don't take time to
set up e-mail filters but I support having two e-mail lists:

* One having new issues only. I think that active developers need to
see new issues to create awareness of what others are doing in the
project, so I think we should really encourage people to subscribe to
this list (and set up an e-mail filter if they don't want the e-mails
coming into their inbox). While I think having less "noise" on dev@ is
a good thing (even though it's only "noise" if you don't set up e-mail
filters) I'm concerned that this action will decrease developer
engagement in the project. There are of course other ways [1] to
subscribe to the JIRA activity feed if getting notifications in Slack
or Zulip is your thing.
* One having all JIRA traffic (i.e. what is currently at
https://lists.apache.org/list.html?iss...@arrow.apache.org)

[1]: https://github.com/ursa-labs/jira-zulip-bridge

On Mon, Jun 8, 2020 at 1:57 PM Antoine Pitrou  wrote:
>
>
> I would welcome a separate list, but only with notifications of new JIRA
> issues.  I am not interested in generic JIRA traffic.
>
> Regards
>
> Antoine.
>
>
> Le 08/06/2020 à 20:46, Neal Richardson a écrit :
> > And if you're like me, and this message got filtered out of your inbox
> > because it is from dev@ and contains "JIRA" in the subject, well, maybe
> > that demonstrates the problem ;)
> >
> > On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson 
> > 
> > wrote:
> >
> >> Hi all,
> >> I've noticed that some other Apache projects have a separate mailing list
> >> for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
> >> The result is that the dev@ mailing list is focused on actual discussions
> >> threads (like this!), votes, and other official business. Would we be
> >> interested in doing the same?
> >>
> >> In my opinion, the status quo is not great. The dev@ archives (
> >> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't that
> >> readable/browseable to me, and if I want to see what's going on in JIRA, I
> >> go to JIRA. In fact, the first thing I/we recommend to people signing up
> >> for the mailing list is to set up email filters to exclude the JIRA noise.
> >> Having a separate mailing list will make it easier for people to manage
> >> their own informations streams better.
> >>
> >> The counterargument is that moving JIRA traffic to a separate mailing
> >> list, requiring an additional subscribe action, might mean that developers
> >> miss out on things like new issues being created. I'm not personally
> >> worried about this because I suspect that many of us already aren't using
> >> the mailing list to stay on top of JIRA issues, and that those who want the
> >> JIRA stream in their email can easily opt-in (subscribe). But I'm
> >> interested in the community's opinions on this.
> >>
> >> Thoughts?
> >>
> >> Neal
> >>
> >


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Antoine Pitrou


I would welcome a separate list, but only with notifications of new JIRA
issues.  I am not interested in generic JIRA traffic.

Regards

Antoine.


Le 08/06/2020 à 20:46, Neal Richardson a écrit :
> And if you're like me, and this message got filtered out of your inbox
> because it is from dev@ and contains "JIRA" in the subject, well, maybe
> that demonstrates the problem ;)
> 
> On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson 
> wrote:
> 
>> Hi all,
>> I've noticed that some other Apache projects have a separate mailing list
>> for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
>> The result is that the dev@ mailing list is focused on actual discussions
>> threads (like this!), votes, and other official business. Would we be
>> interested in doing the same?
>>
>> In my opinion, the status quo is not great. The dev@ archives (
>> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't that
>> readable/browseable to me, and if I want to see what's going on in JIRA, I
>> go to JIRA. In fact, the first thing I/we recommend to people signing up
>> for the mailing list is to set up email filters to exclude the JIRA noise.
>> Having a separate mailing list will make it easier for people to manage
>> their own informations streams better.
>>
>> The counterargument is that moving JIRA traffic to a separate mailing
>> list, requiring an additional subscribe action, might mean that developers
>> miss out on things like new issues being created. I'm not personally
>> worried about this because I suspect that many of us already aren't using
>> the mailing list to stay on top of JIRA issues, and that those who want the
>> JIRA stream in their email can easily opt-in (subscribe). But I'm
>> interested in the community's opinions on this.
>>
>> Thoughts?
>>
>> Neal
>>
> 


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Adam Szmigin

Hi Neal,

On 08/06/2020 19:43, Neal Richardson wrote:

I've noticed that some other Apache projects have a separate mailing list
for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
The result is that the dev@ mailing list is focused on actual discussions
threads (like this!), votes, and other official business. Would we be
interested in doing the same?


I have been lazy and not set up any anti-JIRA filters in the few weeks 
that I have been a member of this mailing list. Deleting JIRA 
notifications has fast become the most popular activity that my email 
client sees :-).


So from the perspective of a new member of the community, I can see how 
some might find this a turn-off, and maybe even be dissuaded from 
participation - obviously not something anyone here would want.


I'd certainly support a dedicated list for JIRA notifications.

--
Adam Szmigin



Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Neal Richardson
And if you're like me, and this message got filtered out of your inbox
because it is from dev@ and contains "JIRA" in the subject, well, maybe
that demonstrates the problem ;)

On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson 
wrote:

> Hi all,
> I've noticed that some other Apache projects have a separate mailing list
> for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
> The result is that the dev@ mailing list is focused on actual discussions
> threads (like this!), votes, and other official business. Would we be
> interested in doing the same?
>
> In my opinion, the status quo is not great. The dev@ archives (
> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't that
> readable/browseable to me, and if I want to see what's going on in JIRA, I
> go to JIRA. In fact, the first thing I/we recommend to people signing up
> for the mailing list is to set up email filters to exclude the JIRA noise.
> Having a separate mailing list will make it easier for people to manage
> their own informations streams better.
>
> The counterargument is that moving JIRA traffic to a separate mailing
> list, requiring an additional subscribe action, might mean that developers
> miss out on things like new issues being created. I'm not personally
> worried about this because I suspect that many of us already aren't using
> the mailing list to stay on top of JIRA issues, and that those who want the
> JIRA stream in their email can easily opt-in (subscribe). But I'm
> interested in the community's opinions on this.
>
> Thoughts?
>
> Neal
>


[DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-08 Thread Neal Richardson
Hi all,
I've noticed that some other Apache projects have a separate mailing list
for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
The result is that the dev@ mailing list is focused on actual discussions
threads (like this!), votes, and other official business. Would we be
interested in doing the same?

In my opinion, the status quo is not great. The dev@ archives (
https://lists.apache.org/list.html?dev@arrow.apache.org) aren't that
readable/browseable to me, and if I want to see what's going on in JIRA, I
go to JIRA. In fact, the first thing I/we recommend to people signing up
for the mailing list is to set up email filters to exclude the JIRA noise.
Having a separate mailing list will make it easier for people to manage
their own informations streams better.

The counterargument is that moving JIRA traffic to a separate mailing list,
requiring an additional subscribe action, might mean that developers miss
out on things like new issues being created. I'm not personally worried
about this because I suspect that many of us already aren't using the
mailing list to stay on top of JIRA issues, and that those who want the
JIRA stream in their email can easily opt-in (subscribe). But I'm
interested in the community's opinions on this.

Thoughts?

Neal


[jira] [Created] (ARROW-9071) [C++] MakeArrayOfNull makes invalid ListArray

2020-06-08 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-9071:


 Summary: [C++] MakeArrayOfNull makes invalid ListArray
 Key: ARROW-9071
 URL: https://issues.apache.org/jira/browse/ARROW-9071
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Zhuo Peng


One way to reproduce this bug is:

 

>>> a = pa.array([[1, 2]])

>>> b = pa.array([None, None], type=pa.null())

>>> t1 = pa.Table.from_arrays([a], ["a"])
>>> t2 = pa.Table.from_arrays([b], ["b"])

 

>>> pa.concat_tables([t1, t2], promote=True)
Traceback (most recent call last):
 File "", line 1, in 
 File "pyarrow/table.pxi", line 2138, in pyarrow.lib.concat_tables
 File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table
 File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 0: In chunk 1: Invalid: List child array 
invalid: Invalid: Buffer #1 too small in array of type int64 and length 2: 
expected at least 16 byte(s), got 12

(because concat_tables(promote=True) will call MakeArrayOfNulls 
([https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647))|https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647)']

 

The code here seems incorrect:

[https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/array/util.cc#L218]

the length of the child array of a ListArray may not equal to the length of the 
ListArray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9070) [C++] StructScalar needs field accessor methods

2020-06-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9070:
--

 Summary: [C++] StructScalar needs field accessor methods
 Key: ARROW-9070
 URL: https://issues.apache.org/jira/browse/ARROW-9070
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


The minmax compute function returns a struct with fields "min" and "max". So to 
write an R binding for the {{min()}} method on arrow objects, I call "minmax" 
and then take the "min" field from the result. However, at least from my 
reading of scalar.h compared with array_nested.h, there are no 
field/GetFieldByName/etc. methods for StructScalar, so I can't get it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9069) [C++] MakeArrayFromScalar can't handle struct

2020-06-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9069:
--

 Summary: [C++] MakeArrayFromScalar can't handle struct
 Key: ARROW-9069
 URL: https://issues.apache.org/jira/browse/ARROW-9069
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


The R bindings translate data to/from Scalars by using the Array methods 
already implemented: to go from R object to a Scalar, it creates a length-1 
Array and then slices out the 0th element with GetScalar(); to go from Scalar 
to R object, it calls MakeArrayFromScalar and then the as.vector method on that 
Array (in R, there is no scalar type anyway, only length-1 vectors). 

This generally works fine but if I get a Struct scalar (as the minmax compute 
function returns), I can't do anything with it because MakeArrayFromScalar 
doesn't work with structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9068) [C++][Dataset] Simplify Partitioning interface

2020-06-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9068:
-

 Summary: [C++][Dataset] Simplify Partitioning interface
 Key: ARROW-9068
 URL: https://issues.apache.org/jira/browse/ARROW-9068
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques


The `int segment` of `Partitioning::Parse` should not be exposed to the user. 
KeyValuePartiioning should be a private Impl interface, not in public headers. 

The same apply to `Partitioning::Format`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9067) [C++] Create reusable branchless / vectorized index boundschecking functions

2020-06-08 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9067:
---

 Summary: [C++] Create reusable branchless / vectorized index 
boundschecking functions
 Key: ARROW-9067
 URL: https://issues.apache.org/jira/browse/ARROW-9067
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


It is possible to do branch-free index boundschecking in batches for better 
performance. 

I am implementing this as part of the Take/Filter optimization (so please wait 
until I have PRs up for this work), but these functions can be moved somewhere 
more general purpose and used in places where we are currently boundschecking 
inside inner loops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9066) [Python] Raise correct error in isnull()

2020-06-08 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-9066:
---

 Summary: [Python] Raise correct error in isnull()
 Key: ARROW-9066
 URL: https://issues.apache.org/jira/browse/ARROW-9066
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-06-08-0

2020-06-08 Thread Crossbow


Arrow Build Report for Job nightly-2020-06-08-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0

Failed Tasks:
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-centos-8-amd64
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-test-conda-python-3.8-jpype
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux1-cp35m
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux1-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux1-cp37m
- wheel-manylinux1-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux1-cp38
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2010-cp35m
- wheel-manylinux2010-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2010-cp37m
- wheel-manylinux2010-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2010-cp38
- wheel-manylinux2014-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2014-cp35m
- wheel-manylinux2014-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2014-cp36m
- wheel-manylinux2014-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-wheel-manylinux2014-cp38

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-github-centos-6-amd64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-clean
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-08-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 

[jira] [Created] (ARROW-9065) Support parsing date32 in dataset partition folders

2020-06-08 Thread Dave Hirschfeld (Jira)
Dave Hirschfeld created ARROW-9065:
--

 Summary: Support parsing date32 in dataset partition folders
 Key: ARROW-9065
 URL: https://issues.apache.org/jira/browse/ARROW-9065
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Dave Hirschfeld


I have some data which is partitioned by year/month/date. It would be useful if 
the date could be automatically parsed:
```python

In [17]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), 
("day", pa.date32())])

In [18]: partition = DirectoryPartitioning(schema)

In [19]: partition.parse("/2020/06/2020-06-08")
---
ArrowNotImplementedError Traceback (most recent call last)
 in 
> 1 partition.parse("/2020/06/2020-06-08")

~\envs\dev\lib\site-packages\pyarrow\_dataset.pyx in 
pyarrow._dataset.Partitioning.parse()

~\envs\dev\lib\site-packages\pyarrow\error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: parsing scalars of type date32[day]

```


Not a big issue since you can just use string and convert, but nevertheless it 
would be nice if it Just Worked
```python

In [22]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), 
("day", pa.string())])

In [23]: partition = DirectoryPartitioning(schema)

In [24]: partition.parse("/2020/06/2020-06-08")
Out[24]: 
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9064) optimization debian package manager tweaks

2020-06-08 Thread Pratik Raj (Jira)
Pratik Raj created ARROW-9064:
-

 Summary: optimization debian package manager tweaks
 Key: ARROW-9064
 URL: https://issues.apache.org/jira/browse/ARROW-9064
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Pratik Raj


By default, Ubuntu or Debian based "apt" or "apt-get" system installs 
recommended but not suggested packages .

By passing "--no-install-recommends" option, the user lets apt-get know not to 
consider recommended packages as a dependency to install.

This results in smaller downloads and installation of packages .

Refer to blog at [Ubuntu Blog] at 
https://ubuntu.com/blog/we-reduced-our-docker-images-by-60-with-no-install-recommends



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9063) [Python][C++] Order of files are not respected using the new pyarrow.dataset

2020-06-08 Thread William Liu (Jira)
William Liu created ARROW-9063:
--

 Summary: [Python][C++] Order of files are not respected using the 
new pyarrow.dataset
 Key: ARROW-9063
 URL: https://issues.apache.org/jira/browse/ARROW-9063
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.17.1
 Environment: ubuntu-18.04
Reporter: William Liu


Say we have multiple parquet files under the same folder (a.parquet, b.parquet, 
c.parquet). If I pass a list of file paths into either of the two statements 
below
{code:java}
ds = pq.ParquetDataset(fps, use_legacy_dataset=False)
ds = pyarrow.dataset(fps){code}
Then rows of the resulting table will have:

......aaa......aaa...ccc..bbb...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9062) [Rust] Support to read JSON into dictionary type

2020-06-08 Thread Sven Wagner-Boysen (Jira)
Sven Wagner-Boysen created ARROW-9062:
-

 Summary: [Rust] Support to read JSON into dictionary type
 Key: ARROW-9062
 URL: https://issues.apache.org/jira/browse/ARROW-9062
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Sven Wagner-Boysen


Currently a JSON reader build from a schema using the type dictionary for one 
of the fields in the schema will fail with JsonError("struct types are not yet 
supported")
{code:java}
let builder = ReaderBuilder::new().with_schema(..)
let mut reader: Reader = 
builder.build::(File::open(path).unwrap()).unwrap();
let rb = reader.next().unwrap()

{code}
 

Suggested solution:

Support reading into a dictionary in Json Reader: 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L368]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)