[jira] [Updated] (ARROW-7059) [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7059: - Labels: parquet performance (was: performance) > [Python] Reading parquet file w

[jira] [Commented] (ARROW-6876) [Python] Reading parquet file with many columns becomes slow for 0.15.0

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984268#comment-16984268 ] Joris Van den Bossche commented on ARROW-6876: -- The open issue about this is

[jira] [Updated] (ARROW-7059) [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7059: - Description: Reading Parquet files with large number of columns still seems to be

[jira] [Updated] (ARROW-7059) [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7059: - Description: Reading Parquet files with large number of columns still seems to be

[jira] [Updated] (ARROW-7059) [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7059: - Description: Reading Parquet files with large number of columns still seems to be

[jira] [Created] (ARROW-7273) [Python] Non-nullable null field is allowed / crashes when writing to parquet

2019-11-28 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7273: Summary: [Python] Non-nullable null field is allowed / crashes when writing to parquet Key: ARROW-7273 URL: https://issues.apache.org/jira/browse/ARROW-7273

[jira] [Updated] (ARROW-7273) [Python] Non-nullable null field is allowed / crashes when writing to parquet

2019-11-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7273: - Labels: parquet (was: ) > [Python] Non-nullable null field is allowed / crashes

[jira] [Updated] (ARROW-7282) [Python] let ArrowIOError subclass from FileNotFoundError when appropriate

2019-12-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7282: - Summary: [Python] let ArrowIOError subclass from FileNotFoundError when appropria

[jira] [Assigned] (ARROW-7296) [Python] Add ORC api documentation

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7296: Assignee: Joris Van den Bossche > [Python] Add ORC api documentation > ---

[jira] [Updated] (ARROW-7305) [Python] High memory usage writing pyarrow.Table to parquet

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7305: - Labels: parquet (was: ) > [Python] High memory usage writing pyarrow.Table to pa

[jira] [Updated] (ARROW-7305) [Python] High memory usage writing pyarrow.Table with large strings to parquet

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7305: - Summary: [Python] High memory usage writing pyarrow.Table with large strings to p

[jira] [Assigned] (ARROW-7314) [Python] Compiler warning in pyarrow

2019-12-05 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7314: Assignee: Joris Van den Bossche > [Python] Compiler warning in pyarrow > -

[jira] [Commented] (ARROW-842) [Python] Handle more kinds of null sentinel objects from pandas 0.x

2019-12-05 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988909#comment-16988909 ] Joris Van den Bossche commented on ARROW-842: - The next pandas release is also

[jira] [Closed] (ARROW-7220) [CI][Python] Github actions macOS python 3 build is using python 2

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-7220. Resolution: Duplicate > [CI][Python] Github actions macOS python 3 build is using p

[jira] [Updated] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7345: - Labels: dataset parquet (was: parquet) > [Python] Writing partitions with NaNs s

[jira] [Updated] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7345: - Labels: parquet (was: ) > [Python] Writing partitions with NaNs silently drops d

[jira] [Commented] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991615#comment-16991615 ] Joris Van den Bossche commented on ARROW-7345: -- [~karldw] you are correct th

[jira] [Updated] (ARROW-7363) [Python] flatten() doesn't work on ChunkedArray

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7363: - Summary: [Python] flatten() doesn't work on ChunkedArray (was: flatten() doesn't

[jira] [Updated] (ARROW-7363) [Python] flatten() doesn't work on ChunkedArray

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7363: - Component/s: Python > [Python] flatten() doesn't work on ChunkedArray > -

[jira] [Commented] (ARROW-7363) [Python] flatten() doesn't work on ChunkedArray

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992519#comment-16992519 ] Joris Van den Bossche commented on ARROW-7363: -- >From looking at the code, I

[jira] [Updated] (ARROW-7266) [Python] dictionary_encode() of a slice gives wrong result

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7266: - Fix Version/s: 1.0.0 > [Python] dictionary_encode() of a slice gives wrong result

[jira] [Updated] (ARROW-7041) [Python] PythonLibs setting found by CMake uses wrong version of Python on macOS

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7041: - Component/s: Python > [Python] PythonLibs setting found by CMake uses wrong versi

[jira] [Commented] (ARROW-7362) [Python] ListArray.flatten() should take care of slicing offsets

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992524#comment-16992524 ] Joris Van den Bossche commented on ARROW-7362: -- There was some discussion ab

[jira] [Updated] (ARROW-7365) [Python] Support FixedSizeList type in conversion to numpy/pandas

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7365: - Fix Version/s: 1.0.0 > [Python] Support FixedSizeList type in conversion to numpy

[jira] [Created] (ARROW-7365) [Python] Support FixedSizeList type in conversion to numpy/pandas

2019-12-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7365: Summary: [Python] Support FixedSizeList type in conversion to numpy/pandas Key: ARROW-7365 URL: https://issues.apache.org/jira/browse/ARROW-7365 Proje

[jira] [Updated] (ARROW-7336) [C++] Implement MinMax options to not skip nulls

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7336: - Summary: [C++] Implement MinMax options to not skip nulls (was: implement minmax

[jira] [Commented] (ARROW-7350) [Python] Parquet file metadata min and max statistics not decoded from bytes for Decimal data types

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992597#comment-16992597 ] Joris Van den Bossche commented on ARROW-7350: -- [~max.firman] Thanks for the

[jira] [Commented] (ARROW-6775) Proposal for several Array utility functions

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992607#comment-16992607 ] Joris Van den Bossche commented on ARROW-6775: -- [~brillsp] thanks for openin

[jira] [Commented] (ARROW-7362) [Python] ListArray.flatten() should take care of slicing offsets

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992631#comment-16992631 ] Joris Van den Bossche commented on ARROW-7362: -- Yes, the main thing is that

[jira] [Commented] (ARROW-7362) [Python] ListArray.flatten() should take care of slicing offsets

2019-12-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992636#comment-16992636 ] Joris Van den Bossche commented on ARROW-7362: -- Another option could be to a

[jira] [Commented] (ARROW-7375) [Python] Expose C++ MakeArrayOfNull

2019-12-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994402#comment-16994402 ] Joris Van den Bossche commented on ARROW-7375: -- It could maybe also be a {{p

[jira] [Updated] (ARROW-7376) parquet NaN/null double statistics can result in endless loop

2019-12-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7376: - Component/s: C++ > parquet NaN/null double statistics can result in endless loop

[jira] [Updated] (ARROW-7376) parquet NaN/null double statistics can result in endless loop

2019-12-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7376: - Labels: parquet (was: ) > parquet NaN/null double statistics can result in endle

[jira] [Updated] (ARROW-7080) [Python][Parquet] Expose parquet field_id in Schema objects

2019-12-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7080: - Labels: parquet (was: ) > [Python][Parquet] Expose parquet field_id in Schema ob

[jira] [Assigned] (ARROW-6996) [Python] Expose boolean filter kernel on Table

2019-12-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6996: Assignee: Joris Van den Bossche > [Python] Expose boolean filter kernel on

[jira] [Created] (ARROW-7430) [Python] Add more docstrings to dataset bindings

2019-12-18 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7430: Summary: [Python] Add more docstrings to dataset bindings Key: ARROW-7430 URL: https://issues.apache.org/jira/browse/ARROW-7430 Project: Apache Arrow

[jira] [Updated] (ARROW-7430) [Python] Add more docstrings to dataset bindings

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7430: - Labels: dataset (was: pull-request-available) > [Python] Add more docstrings to

[jira] [Updated] (ARROW-7431) [Python] Add dataset API to reference docs

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7431: - Labels: dataset (was: ) > [Python] Add dataset API to reference docs > -

[jira] [Updated] (ARROW-7431) [Python] Add dataset API to reference docs

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7431: - Fix Version/s: 1.0.0 > [Python] Add dataset API to reference docs > -

[jira] [Created] (ARROW-7431) [Python] Add dataset API to reference docs

2019-12-18 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7431: Summary: [Python] Add dataset API to reference docs Key: ARROW-7431 URL: https://issues.apache.org/jira/browse/ARROW-7431 Project: Apache Arrow

[jira] [Created] (ARROW-7432) [Python] Add higher-level datasets functions

2019-12-18 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7432: Summary: [Python] Add higher-level datasets functions Key: ARROW-7432 URL: https://issues.apache.org/jira/browse/ARROW-7432 Project: Apache Arrow

[jira] [Updated] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7385: - Description: {code} import pandas as pd import pyarrow as pa import pyarrow.parqu

[jira] [Updated] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7385: - Description: {code} import pandas as pd import pyarrow as pa import pyarrow.parqu

[jira] [Commented] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999250#comment-16999250 ] Joris Van den Bossche commented on ARROW-7385: -- [~mrmathematica] Thanks for

[jira] [Commented] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2019-12-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999253#comment-16999253 ] Joris Van den Bossche commented on ARROW-7385: -- cc [~rgruener] > [Python] P

[jira] [Created] (ARROW-7497) [Python] pandas master failures: pandas.util.testing is deprecated

2020-01-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7497: Summary: [Python] pandas master failures: pandas.util.testing is deprecated Key: ARROW-7497 URL: https://issues.apache.org/jira/browse/ARROW-7497 Proj

[jira] [Assigned] (ARROW-7087) [Python] Table Metadata disappear when we write a partitioned dataset

2020-01-07 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7087: Assignee: François Blanchard > [Python] Table Metadata disappear when we w

[jira] [Resolved] (ARROW-7087) [Python] Table Metadata disappear when we write a partitioned dataset

2020-01-07 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7087. -- Resolution: Fixed Issue resolved by pull request 6127 [https://github.com/apach

[jira] [Updated] (ARROW-7512) [C++] Dictionary memo missing elements in id_to_dictionary_ map after deserialization

2020-01-08 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7512: - Summary: [C++] Dictionary memo missing elements in id_to_dictionary_ map after de

[jira] [Commented] (ARROW-7413) [Python][Dataset] Add tests for PartitionSchemeDiscovery

2020-01-08 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010713#comment-17010713 ] Joris Van den Bossche commented on ARROW-7413: -- [~bkietz] I suppose you are

[jira] [Assigned] (ARROW-7527) [Python] pandas/feather tests failing on pandas master

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7527: Assignee: Joris Van den Bossche > [Python] pandas/feather tests failing on

[jira] [Created] (ARROW-7527) [Python] pandas/feather tests failing on pandas master

2020-01-09 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7527: Summary: [Python] pandas/feather tests failing on pandas master Key: ARROW-7527 URL: https://issues.apache.org/jira/browse/ARROW-7527 Project: Apache A

[jira] [Updated] (ARROW-7527) [Python] pandas/feather tests failing on pandas master

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7527: - Fix Version/s: 0.16.0 > [Python] pandas/feather tests failing on pandas master >

[jira] [Updated] (ARROW-7497) [Python] Test asserts: pandas.util.testing is deprecated, use pandas.testing instead

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7497: - Summary: [Python] Test asserts: pandas.util.testing is deprecated, use pandas.tes

[jira] [Created] (ARROW-7528) [Python] The pandas.datetime class (import of datetime.datetime) is deprecated

2020-01-09 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7528: Summary: [Python] The pandas.datetime class (import of datetime.datetime) is deprecated Key: ARROW-7528 URL: https://issues.apache.org/jira/browse/ARROW-7528

[jira] [Updated] (ARROW-7528) [Python] The pandas.datetime class (import of datetime.datetime) and pandas.np are deprecated

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7528: - Description: The {{pd.datetime}} was actually just an import from {{datetime.date

[jira] [Updated] (ARROW-7528) [Python] The pandas.datetime class (import of datetime.datetime) and pandas.np are deprecated

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7528: - Summary: [Python] The pandas.datetime class (import of datetime.datetime) and pan

[jira] [Commented] (ARROW-7497) [Python] pandas master failures: pandas.util.testing is deprecated

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011626#comment-17011626 ] Joris Van den Bossche commented on ARROW-7497: -- The failures are gone (fixed

[jira] [Assigned] (ARROW-7413) [Python][Dataset] Add tests for PartitionSchemeDiscovery

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7413: Assignee: Joris Van den Bossche (was: Ben Kietzman) > [Python][Dataset] A

[jira] [Commented] (ARROW-7498) [C++][Dataset] Rename DataFragment/DataSource/PartitionScheme

2020-01-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011893#comment-17011893 ] Joris Van den Bossche commented on ARROW-7498: -- I personally find "partition

[jira] [Updated] (ARROW-7545) [C++] [Dataset] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7545: - Summary: [C++] [Dataset] Scanning dataset with dictionary type hangs (was: [C++]

[jira] [Created] (ARROW-7545) [C++] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7545: Summary: [C++] Scanning dataset with dictionary type hangs Key: ARROW-7545 URL: https://issues.apache.org/jira/browse/ARROW-7545 Project: Apache Arrow

[jira] [Commented] (ARROW-7545) [C++] [Dataset] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012681#comment-17012681 ] Joris Van den Bossche commented on ARROW-7545: -- So if the table has a single

[jira] [Created] (ARROW-7547) [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat

2020-01-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7547: Summary: [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat Key: ARROW-7547 URL: https://issues.apache.org/jira/browse/ARROW-7547

[jira] [Updated] (ARROW-7561) [Doc][Python] fix conda environment command

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7561: - Fix Version/s: 0.16.0 > [Doc][Python] fix conda environment command > ---

[jira] [Commented] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014961#comment-17014961 ] Joris Van den Bossche commented on ARROW-7555: -- With conda, on the other han

[jira] [Closed] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-7555. Resolution: Duplicate > [Python] Drop support for python 2.7 >

[jira] [Commented] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014963#comment-17014963 ] Joris Van den Bossche commented on ARROW-7555: -- But closing this as a duplic

[jira] [Updated] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5757: - Fix Version/s: (was: 2.0.0) 1.0.0 > [Python] Stop supporti

[jira] [Commented] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014964#comment-17014964 ] Joris Van den Bossche commented on ARROW-5757: -- Some discussion happened in

[jira] [Commented] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014965#comment-17014965 ] Joris Van den Bossche commented on ARROW-5757: -- Wes: {quote}We probably ne

[jira] [Created] (ARROW-7569) [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions

2020-01-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7569: Summary: [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions Key: ARROW-7569 URL: https://issues.apache.org/jira/browse/ARROW-7

[jira] [Commented] (ARROW-7063) [C++] Schema print method prints too much metadata

2020-01-15 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016295#comment-17016295 ] Joris Van den Bossche commented on ARROW-7063: -- A reason to have at least so

[jira] [Commented] (ARROW-7063) [C++] Schema print method prints too much metadata

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016787#comment-17016787 ] Joris Van den Bossche commented on ARROW-7063: -- > I can {{print(schema)}} an

[jira] [Commented] (ARROW-7063) [C++] Schema print method prints too much metadata

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016789#comment-17016789 ] Joris Van den Bossche commented on ARROW-7063: -- To add: this pretty print is

[jira] [Commented] (ARROW-7415) [C++][Dataset] Implement IpcFormat for sources composed of ipc files

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016790#comment-17016790 ] Joris Van den Bossche commented on ARROW-7415: -- Should we open issues about

[jira] [Commented] (ARROW-7556) [Python][Parquet] Performance regression in pyarrow-0.15.1 vs pyarrow-0.12.1 when reading a "partitioned parquet table" ?

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016793#comment-17016793 ] Joris Van den Bossche commented on ARROW-7556: -- Just a note: I quickly check

[jira] [Updated] (ARROW-7556) [Python][Parquet] Performance regression in pyarrow-0.15.1 vs pyarrow-0.12.1 when reading a "partitioned parquet table" ?

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7556: - Labels: parquet (was: ) > [Python][Parquet] Performance regression in pyarrow-0.

[jira] [Commented] (ARROW-7415) [C++][Dataset] Implement IpcFormat for sources composed of ipc files

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016804#comment-17016804 ] Joris Van den Bossche commented on ARROW-7415: -- For R it's ARROW-7578 which

[jira] [Assigned] (ARROW-7311) [Python] Return filesystem and path from URI

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7311: Assignee: Krisztian Szucs > [Python] Return filesystem and path from URI >

[jira] [Resolved] (ARROW-7311) [Python] Return filesystem and path from URI

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7311. -- Fix Version/s: 0.16.0 Resolution: Fixed Issue resolved by pull request 6

[jira] [Created] (ARROW-7591) [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array

2020-01-16 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7591: Summary: [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array Key: ARROW-7591 URL: https://issues.apache.org/jira/browse/ARROW-7591

[jira] [Assigned] (ARROW-7591) [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7591: Assignee: Joris Van den Bossche > [Python] DictionaryArray.to_numpy return

[jira] [Updated] (ARROW-7591) [Python] DictionaryArray.to_numpy returns dict of parts instead of numpy array

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7591: - Fix Version/s: 0.16.0 > [Python] DictionaryArray.to_numpy returns dict of parts i

[jira] [Created] (ARROW-7593) [CI][Python] Python datasets failing on master / not run on CI

2020-01-16 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7593: Summary: [CI][Python] Python datasets failing on master / not run on CI Key: ARROW-7593 URL: https://issues.apache.org/jira/browse/ARROW-7593 Project:

[jira] [Updated] (ARROW-7593) [CI][Python] Python datasets failing on master / not run on CI

2020-01-16 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7593: - Fix Version/s: 0.16.0 > [CI][Python] Python datasets failing on master / not run

[jira] [Assigned] (ARROW-7593) [CI][Python] Python datasets failing on master / not run on CI

2020-01-17 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7593: Assignee: Joris Van den Bossche > [CI][Python] Python datasets failing on

[jira] [Assigned] (ARROW-7431) [Python] Add dataset API to reference docs

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7431: Assignee: Joris Van den Bossche > [Python] Add dataset API to reference do

[jira] [Commented] (ARROW-7617) [Python] Slices of Dataframes with Categorical columns are not respected in write_to_dataset

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020159#comment-17020159 ] Joris Van den Bossche commented on ARROW-7617: -- [~Filimonov] thanks for the

[jira] [Updated] (ARROW-7617) [Python] parquet.write_to_dataset creates empty partitions for non-observed dictionary items (categories)

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7617: - Summary: [Python] parquet.write_to_dataset creates empty partitions for non-obser

[jira] [Reopened] (ARROW-7617) [Python] Slices of Dataframes with Categorical columns are not respected in write_to_dataset

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reopened ARROW-7617: -- > [Python] Slices of Dataframes with Categorical columns are not respected in > wr

[jira] [Updated] (ARROW-7617) [Python] parquet.write_to_dataset creates empty partitions for non-observed dictionary items (categories)

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7617: - Labels: dataset parquet (was: ) > [Python] parquet.write_to_dataset creates empt

[jira] [Commented] (ARROW-7617) [Python] parquet.write_to_dataset creates empty partitions for non-observed dictionary items (categories)

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020160#comment-17020160 ] Joris Van den Bossche commented on ARROW-7617: -- Reopened and changed the tit

[jira] [Commented] (ARROW-7614) [Python] Slow performance in test_parquet.py::test_set_data_page_size

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020171#comment-17020171 ] Joris Van den Bossche commented on ARROW-7614: -- I think it is due to the tes

[jira] [Commented] (ARROW-7608) [C++][Dataset] Expose more informational properties

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020219#comment-17020219 ] Joris Van den Bossche commented on ARROW-7608: -- It would indeed be nice to h

[jira] [Updated] (ARROW-6579) [Python] Parallel pyarrow.parquet.write_to_dataset

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6579: - Labels: dataset parquet (was: dataset) > [Python] Parallel pyarrow.parquet.write

[jira] [Created] (ARROW-7634) [Python] Dataset tests failing on Windows to parse file path

2020-01-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7634: Summary: [Python] Dataset tests failing on Windows to parse file path Key: ARROW-7634 URL: https://issues.apache.org/jira/browse/ARROW-7634 Project: A

[jira] [Assigned] (ARROW-7634) [Python] Dataset tests failing on Windows to parse file path

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7634: Assignee: Joris Van den Bossche > [Python] Dataset tests failing on Window

[jira] [Created] (ARROW-7636) [Python] Clean-up the pyarrow.dataset.partitioning() API

2020-01-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7636: Summary: [Python] Clean-up the pyarrow.dataset.partitioning() API Key: ARROW-7636 URL: https://issues.apache.org/jira/browse/ARROW-7636 Project: Apache

[jira] [Created] (ARROW-7638) [Python] Segfault when inspecting dataset.Source with invalid file/partitioning

2020-01-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7638: Summary: [Python] Segfault when inspecting dataset.Source with invalid file/partitioning Key: ARROW-7638 URL: https://issues.apache.org/jira/browse/ARROW-7638

<    5   6   7   8   9   10   11   12   13   14   >