date:20200122

[jira] [Created] (ARROW-7659) [Rust] Reduce Rc usage

2020-01-22 Thread Gurwinder Singh (Jira)

Gurwinder Singh created ARROW-7659:
--

 Summary: [Rust] Reduce Rc usage
 Key: ARROW-7659
 URL: https://issues.apache.org/jira/browse/ARROW-7659
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Gurwinder Singh
Assignee: Gurwinder Singh


Follow up of ARROW-7560



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7658) [R] Support dplyr filtering on date/time

2020-01-22 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-7658:
--

 Summary: [R] Support dplyr filtering on date/time
 Key: ARROW-7658
 URL: https://issues.apache.org/jira/browse/ARROW-7658
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.16.0


Plus some NSE refactoring suggested by Hadley. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7657) [R] Add option to preserve dictionary logical type rather than coerce to factor

2020-01-22 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-7657:
--

 Summary: [R] Add option to preserve dictionary logical type rather 
than coerce to factor
 Key: ARROW-7657
 URL: https://issues.apache.org/jira/browse/ARROW-7657
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


See ARROW-7639. R factor "levels" must be strings, but dictionary "values" 
aren't restricted like that. Provide an option to govern how dictionary arrays 
with non-string "values" are converted to R: keep the dictionary encoding by 
making the R vector be type factor and coerce the dictionary values to strings, 
or keep the dictionary values the type they are and generate an R vector of 
that type, dropping the dictionary encoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: new to Arrow / integration with Kudu

2020-01-22 Thread Wes McKinney

On Wed, Jan 22, 2020 at 12:28 PM Shazz  wrote:
>
> Thanks Wes,
>
> I will follow what is happening between Arrow and Kudu.
> In the short term, if you would have to define a storage for Arrow which
> has good (enough) performance, not too costly to operate... what would
> you choose ? I saw there is an example to store Parquet files on Azure
> Blob Storage, would it be ok to start ? Or there is a better choice ?

Many people are doing that. Note that you'll need to do some tuning
(e.g. read buffering) to obtain acceptable performance against things
like ABS

> ---
> sh...@metaverse.fr
> GPG public key ID : B517C4C8
>
> Le 21/01/2020 17:54, Wes McKinney a écrit :
> > I'm interested to see an Arrow adapter for Apache Kudu developed. My
> > gut feeling is that this work should be undertaken in Kudu itself,
> > potentially having the tablet servers producing Arrow Record Batches
> > locally and sending them to the client rather than converting to
> > Kudu's own on-the-wire record format and then deserializing into Arrow
> > on the receiver side. It might be worth a conversation with the Kudu
> > community to see what they think.
> >
> > Of course one can build an Arrow deserializer for the current Kudu C++
> > client API and probably get pretty good performance. see also
> > ARROW-814
> >
> > https://issues.apache.org/jira/browse/ARROW-814
> >
> > On Tue, Jan 21, 2020 at 12:32 PM Shazz  wrote:
> >>
> >> Hi,
> >>
> >> I'm thinking of an architecture to store and access efficiently
> >> tabular
> >> data and I was told to look at Arrow and Kudu.
> >> I saw on the frontpage a diagram where Arrow can be integrated with
> >> Kudu
> >> but nothing in the documentation. Is there an example available
> >> somewhere ?
> >>
> >> Thanks !
> >>
> >> --
> >> sh...@metaverse.fr
> >> GPG public key ID : B517C4C8

Re: new to Arrow / integration with Kudu

2020-01-22 Thread Shazz


Thanks Wes,

I will follow what is happening between Arrow and Kudu.
In the short term, if you would have to define a storage for Arrow which 
has good (enough) performance, not too costly to operate... what would 
you choose ? I saw there is an example to store Parquet files on Azure 
Blob Storage, would it be ok to start ? Or there is a better choice ?


---
sh...@metaverse.fr
GPG public key ID : B517C4C8

Le 21/01/2020 17:54, Wes McKinney a écrit :

I'm interested to see an Arrow adapter for Apache Kudu developed. My
gut feeling is that this work should be undertaken in Kudu itself,
potentially having the tablet servers producing Arrow Record Batches
locally and sending them to the client rather than converting to
Kudu's own on-the-wire record format and then deserializing into Arrow
on the receiver side. It might be worth a conversation with the Kudu
community to see what they think.

Of course one can build an Arrow deserializer for the current Kudu C++
client API and probably get pretty good performance. see also
ARROW-814

https://issues.apache.org/jira/browse/ARROW-814

On Tue, Jan 21, 2020 at 12:32 PM Shazz  wrote:


Hi,

I'm thinking of an architecture to store and access efficiently 
tabular

data and I was told to look at Arrow and Kudu.
I saw on the frontpage a diagram where Arrow can be integrated with 
Kudu

but nothing in the documentation. Is there an example available
somewhere ?

Thanks !

--
sh...@metaverse.fr
GPG public key ID : B517C4C8

[jira] [Created] (ARROW-7656) [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type Inference

2020-01-22 Thread Tim Lantz (Jira)

Tim Lantz created ARROW-7656:


 Summary: [Python] csv.ConvertOptions Documentation Is Unclear 
Around Disabling Type Inference
 Key: ARROW-7656
 URL: https://issues.apache.org/jira/browse/ARROW-7656
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
 Environment: Documentation, N/A.
Reporter: Tim Lantz


High level description:
 * The documentation 
[here|[https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]]
 says that setting column_types disables type inference.
 * Under the hood I can see why it is clear you need to also set 
ReadOptions.column_names to support all current use cases however it is unclear 
to new users of the library when you read the docs. Especially since you can 
supply a Schema object to column_types in the Python bindings.
 * Suggested change: update the csv.ConvertOptions to note that you also must 
set csv.ReadOptions.column_names in order to disable type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7655) [Python] csv.ConvertOptions Do Not Pass Through/Retain Nullability from Schema

2020-01-22 Thread Tim Lantz (Jira)

Tim Lantz created ARROW-7655:


 Summary: [Python] csv.ConvertOptions Do Not Pass Through/Retain 
Nullability from Schema
 Key: ARROW-7655
 URL: https://issues.apache.org/jira/browse/ARROW-7655
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
 Environment: Reproduced on Ubuntu 18.04 and OSX Catalina in Python 
3.7.4.
Reporter: Tim Lantz


 

Originally mentioned in: [https://github.com/apache/arrow/issues/6243]

*High level description of the issue:*
 * It is possible ([though not 
documented|https://issues.apache.org/jira/browse/ARROW-7654]) that you may 
assign the column_types field of ConvertOptions to a Schema object instead of a 
Dict[str, DataType].
 * Expected result: the nullable attribute, in addition to the type, of the 
Fields in the Schema supplied are present on the Schema used when reading CSV 
data.
 * Actual result: the Field type information is present, but nullable is lost. 
All fields are nullable.

*Minimal reproduction case:*
 * Use case notes: this is especially noticeable when using pyarrow as a meant 
to save data with a known schema to parquet as the ParquetWriter will check 
that the schema of a table being written matches the schema supplied to the 
writer. If that same schema is used to to read the CSV data and contains a 
nullable field, a mismatch will be detected resulting in an error which is 
demonstrated below.

 
{code:java}
$ cat test.csv 
0
1
$ python
>>> import pyarrow
>>> schema = pyarrow.schema([pyarrow.field(name="foo", type=pyarrow.bool_(), 
>>> nullable=False)])
>>> read_options = csv.ReadOptions(column_names=["foo"])
>>> from pyarrow import csv
>>> read_options = csv.ReadOptions(column_names=["foo"])
>>> convert_options = csv.ConvertOptions(column_types=schema)
>>> table = csv.read_csv("test.csv", convert_options=convert_options, 
>>> read_options=read_options)
>>> schema
foo: bool not null
>>> table.schema
foo: bool
>>> from pyarrow import parquet as pq
>>> writer = pq.ParquetWriter("test.parquet", schema)
>>> writer.write_table(table)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"(REDACTED)/lib/python3.7/site-packages/pyarrow-0.15.1-py3.7-macosx-10.9-x86_64.egg/pyarrow/parquet.py",
 line 472, in write_table
    raise ValueError(msg)
ValueError: Table schema does not match schema used to create file: 
table:
foo: bool vs. 
file:
foo: bool not null
>>> pyarrow.__version__
'0.15.1'
>>> exit()
$ python --version
Python 3.7.4{code}
 
 * As a side note: if I don't set column_names in read_options when calling 
read_csv, but I set convert_options with column_types set, type inference is 
still performed which seems like a bug vs. what the docs state. That seems like 
a possibly related, but independent bug, and I haven't done a search yet to see 
if it is an open/known issue but if someone believes it should be filed with a 
repro case upon reading this I am happy to help! I only realized this when 
minimizing the repro case as my original code was setting column_names.

*Potential source of issue:*
 * **I did not yet look at how hard it is to fix, but I note that 
[here|https://github.com/apache/arrow/blob/ace72c2afa6b7608bca9ba858fdd10b23e7f2dbf/python/pyarrow/_csv.pyx#L411]
 only the name and type are passed down from a Field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7654) [Python] Ability to Set column_types to a Schema in csv.ConvertOptions is Undocumented

2020-01-22 Thread Tim Lantz (Jira)

Tim Lantz created ARROW-7654:


 Summary: [Python] Ability to Set column_types to a Schema in 
csv.ConvertOptions is Undocumented
 Key: ARROW-7654
 URL: https://issues.apache.org/jira/browse/ARROW-7654
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1, 0.12.0
 Environment: N/A, documentation issue.
Reporter: Tim Lantz


Originally mentioned in: [https://github.com/apache/arrow/issues/6243]

High level description:
 * As of [this 
commit|https://github.com/apache/arrow/commit/df54da211448b5202aa08ed2b245eb78cfd1e50c]
 support to supply a Schema to ConvertOptions in the csv module module was 
added (I'll add, extremely useful!).
 * As of 0.15.1 the [published 
documentation|https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]
 only explains that a dictionary from field name to DataType can be supplied.

Minimal reproduction: N/A, see link.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7653) [C++][Dataset] Handle DictType index mismatch better

2020-01-22 Thread Francois Saint-Jacques (Jira)

Francois Saint-Jacques created ARROW-7653:
-

 Summary: [C++][Dataset] Handle DictType index mismatch better
 Key: ARROW-7653
 URL: https://issues.apache.org/jira/browse/ARROW-7653
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


There will be a schema incompatibility raised if the index width doesn't match 
for fragments/sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7652) [Python] Insert implicit cast in ScannerBuilder.filter

2020-01-22 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-7652:


 Summary: [Python] Insert implicit cast in ScannerBuilder.filter
 Key: ARROW-7652
 URL: https://issues.apache.org/jira/browse/ARROW-7652
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7651) [CI][Crossbow] Nightly macOS wheel builds fail

2020-01-22 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-7651:
--

 Summary: [CI][Crossbow] Nightly macOS wheel builds fail
 Key: ARROW-7651
 URL: https://issues.apache.org/jira/browse/ARROW-7651
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Packaging, Python
Reporter: Neal Richardson
 Fix For: 0.16.0


See https://travis-ci.org/ursa-labs/crossbow/builds/640350008 for example

{code}
$ install_wheel arrow
~/build/ursa-labs/crossbow/arrow ~/build/ursa-labs/crossbow
ERROR: You must give at least one requirement to install (see "pip help 
install")
{code}

cc [~kszucs] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7650) [C++] Dataset tests not built on Windows

2020-01-22 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-7650:
-

 Summary: [C++] Dataset tests not built on Windows
 Key: ARROW-7650
 URL: https://issues.apache.org/jira/browse/ARROW-7650
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Dataset
Reporter: Antoine Pitrou


They are explicitly disabled in {{cpp/src/arrow/dataset/CMakeLists.txt}}. Also, 
if re-enable them, there are many compile errors (on VS 2017).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7649) [Python] Expose dataset PartitioningFactory.inspect ?

2020-01-22 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-7649:


 Summary: [Python] Expose dataset PartitioningFactory.inspect ?
 Key: ARROW-7649
 URL: https://issues.apache.org/jira/browse/ARROW-7649
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Joris Van den Bossche


In C++, the PartitioningFactory has a {{Inspect}} method, which, given a path, 
will infer the schema. 

We could expose this in Python as well, it could eg be used to easily explore 
or illustrate what types are inferred from a path (int32, string)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[NIGHTLY] Arrow Build Report for Job nightly-2020-01-22-0

2020-01-22 Thread Crossbow



Arrow Build Report for Job nightly-2020-01-22-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0

Failed Tasks:
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-win-vs2015-py38
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-gandiva-jar-osx
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-3.7-spark-master
- test-ubuntu-fuzzit-fuzzing:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-ubuntu-fuzzit-fuzzing
- test-ubuntu-fuzzit-regression:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-ubuntu-fuzzit-regression
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-wheel-osx-cp27m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-wheel-osx-cp35m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-wheel-osx-cp36m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-wheel-osx-cp37m
- wheel-osx-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-wheel-osx-cp38

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-conda-osx-clang-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-22-0-circle-test-conda-python-3.7-pandas-latest
-

[jira] [Created] (ARROW-7648) [C++] Sanitize local paths on Windows

2020-01-22 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-7648:
-

 Summary: [C++] Sanitize local paths on Windows
 Key: ARROW-7648
 URL: https://issues.apache.org/jira/browse/ARROW-7648
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


One way or the other, we should try to sanitize local filesystem paths on 
Windows, by converting backslashes into regular slahes.

One place to do it is {{FileSystemFromUri}}. One complication is that 
\-separated paths can fail parsing as a URI, but we only want to sanitize a 
path if we detected it's a local path (by parsing the URI). Perhaps trying on 
error would work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7647) Problem with read_json and arrays

2020-01-22 Thread Johan Forsberg (Jira)

Johan Forsberg created ARROW-7647:
-

 Summary: Problem with read_json and arrays
 Key: ARROW-7647
 URL: https://issues.apache.org/jira/browse/ARROW-7647
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
 Environment: Ubuntu Linux 18.04
Python 3.7.5

Reporter: Johan Forsberg


Hi! I'm trying to load some nested JSON data and am running into a problem with 
arrays. I can reproduce it with a slightly modified example from the 
documentation:
{code:python}
from pyarrow import json
import pyarrow as pa

with open("test.json", "w") as f:
test_json = """{"a": [1], "b": {"c": true, "d": "1991-02-03"}}
{"a": [], "b": {"c": false, "d": "2019-04-01"}}
"""
f.write(test_json)

json.read_json("test.json")
{code}
Running this code with pyarrow 0.15.1 (I also tried 0.14) gives the following 
error:
{code:java}
Traceback (most recent call last):
  File "issue.py", line 11, in 
ccs = json.read_json("test.json")
  File "pyarrow/_json.pyx", line 195, in pyarrow._json.read_json
  File "pyarrow/public-api.pxi", line 285, in pyarrow.lib.pyarrow_wrap_table
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 0 named a expected length 2 but got length 1
{code}
I've tried various combinations and it seems like the error only appears when 
the *total* number of elements in all the "a" arrays is less than the number of 
*rows* in the file. I did not expect there to be any relationship between those 
things and have found nothing in the documentation about it. Is this 
intentional? If not, I'd suspect there's some problem in the validation step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7646) [C++][Dataset] Ability to restrict Hive partitioning to certain fields

2020-01-22 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-7646:
--

 Summary: [C++][Dataset] Ability to restrict Hive partitioning to 
certain fields
 Key: ARROW-7646
 URL: https://issues.apache.org/jira/browse/ARROW-7646
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Dataset
Reporter: Krisztian Szucs


I can imagine use cases where the user only want a subset of the fields 
discovered by the HivePartitioningFactory. 

It would look like the following at the python user level API:
{code:python}
partitioning(field_names=[...], flavor='hive')
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7645) [Packaging][deb][RPM] arm64 build by crossbow is broken

2020-01-22 Thread Kouhei Sutou (Jira)

Kouhei Sutou created ARROW-7645:
---

 Summary: [Packaging][deb][RPM] arm64 build by crossbow is broken
 Key: ARROW-7645
 URL: https://issues.apache.org/jira/browse/ARROW-7645
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7659) [Rust] Reduce Rc usage

[jira] [Created] (ARROW-7658) [R] Support dplyr filtering on date/time

[jira] [Created] (ARROW-7657) [R] Add option to preserve dictionary logical type rather than coerce to factor

Re: new to Arrow / integration with Kudu

Re: new to Arrow / integration with Kudu

[jira] [Created] (ARROW-7656) [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type Inference

[jira] [Created] (ARROW-7655) [Python] csv.ConvertOptions Do Not Pass Through/Retain Nullability from Schema

[jira] [Created] (ARROW-7654) [Python] Ability to Set column_types to a Schema in csv.ConvertOptions is Undocumented

[jira] [Created] (ARROW-7653) [C++][Dataset] Handle DictType index mismatch better

[jira] [Created] (ARROW-7652) [Python] Insert implicit cast in ScannerBuilder.filter

[jira] [Created] (ARROW-7651) [CI][Crossbow] Nightly macOS wheel builds fail

[jira] [Created] (ARROW-7650) [C++] Dataset tests not built on Windows

[jira] [Created] (ARROW-7649) [Python] Expose dataset PartitioningFactory.inspect ?

[NIGHTLY] Arrow Build Report for Job nightly-2020-01-22-0

[jira] [Created] (ARROW-7648) [C++] Sanitize local paths on Windows

[jira] [Created] (ARROW-7647) Problem with read_json and arrays

[jira] [Created] (ARROW-7646) [C++][Dataset] Ability to restrict Hive partitioning to certain fields

[jira] [Created] (ARROW-7645) [Packaging][deb][RPM] arm64 build by crossbow is broken

18 matches

Site Navigation

Mail list logo

Footer information