[jira] [Created] (ARROW-5514) [C++] Printer for uint64 shows wrong values

2019-06-05 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5514:


 Summary: [C++] Printer for uint64 shows wrong values
 Key: ARROW-5514
 URL: https://issues.apache.org/jira/browse/ARROW-5514
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.13.0
Reporter: Joris Van den Bossche


>From the example in ARROW-5430:

{code}
In [16]: pa.array([14989096668145380166, 15869664087396458664], 
type=pa.uint64())   

Out[16]: 

[
  -3457647405564171450,
  -2577079986313092952
]
{code}

I _think_ the actual conversion is correct, and it's only the printer that is 
going wrong, as {{to_numpy}} gives the correct values:

{code}
In [17]: pa.array([14989096668145380166, 15869664087396458664], 
type=pa.uint64()).to_numpy()

Out[17]: array([14989096668145380166, 15869664087396458664], dtype=uint64)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5514) [C++] Printer for uint64 shows wrong values

2019-06-05 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856493#comment-16856493
 ] 

Antoine Pitrou commented on ARROW-5514:
---

This should be simple enough to fix if you want to get acquainted with the C++ 
codebase. You should probably take a look in {{pretty_print.cc}}. Tests must be 
added to {{pretty_print-test.cc}}.

> [C++] Printer for uint64 shows wrong values
> ---
>
> Key: ARROW-5514
> URL: https://issues.apache.org/jira/browse/ARROW-5514
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Joris Van den Bossche
>Priority: Minor
>
> From the example in ARROW-5430:
> {code}
> In [16]: pa.array([14989096668145380166, 15869664087396458664], 
> type=pa.uint64()) 
>   
> Out[16]: 
> 
> [
>   -3457647405564171450,
>   -2577079986313092952
> ]
> {code}
> I _think_ the actual conversion is correct, and it's only the printer that is 
> going wrong, as {{to_numpy}} gives the correct values:
> {code}
> In [17]: pa.array([14989096668145380166, 15869664087396458664], 
> type=pa.uint64()).to_numpy()  
>   
> Out[17]: array([14989096668145380166, 15869664087396458664], dtype=uint64)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4972) [Go] Array equality

2019-06-05 Thread Sebastien Binet (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856501#comment-16856501
 ] 

Sebastien Binet commented on ARROW-4972:


to properly implement https://issues.apache.org/jira/browse/ARROW-5493, there 
is a need for comparing Records and thus Arrays.

I might dedicate a few cycles to tackle this.

do you mind if I assign this issue to me?

> [Go] Array equality
> ---
>
> Key: ARROW-4972
> URL: https://issues.apache.org/jira/browse/ARROW-4972
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Alexandre Crayssac
>Assignee: Alexandre Crayssac
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly

2019-06-05 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856508#comment-16856508
 ] 

Joris Van den Bossche commented on ARROW-5138:
--

[~wesmckinn] I don't think that will solve this problem. The _original_ 
dataframe (when converted to an arrow Table) had a trivial RangeIndex (starting 
at 0, step of 1), so the optimization would have been correctly applied 
according to that logic. 

It is only when a Table is sliced or splitted (in row groups, and then reading 
a single row group instead of the full table) that the RangeIndex metadata get 
"out of date" and no longer match the new (subsetted) arrow Table.

See also ARROW-5427 for a summary issue I made on this topic.

> [Python/C++] Row group retrieval doesn't restore index properly
> ---
>
> Key: ARROW-5138
> URL: https://issues.apache.org/jira/browse/ARROW-5138
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.13.0
>Reporter: Florian Jetter
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> When retrieving row groups the index is no longer properly restored to its 
> initial value and is set to an range index starting at zero no matter what. 
> version 0.12.1 restored and int64 index with the correct index values.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> print(pa.__version__)
> df = pd.DataFrame(
> {"a": [1, 2, 3, 4]}
> )
> print("total DF")
> print(df.index)
> table = pa.Table.from_pandas(df)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf, chunk_size=2)
> reader = pa.BufferReader(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(reader)
> rg = parquet_file.read_row_group(1)
> df_restored = rg.to_pandas()
> print("Row group")
> print(df_restored.index)
> {code}
> Previous behavior
> {code:python}
> 0.12.1
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> Int64Index([2, 3], dtype='int64')
> {code}
> Behavior now
> {code:python}
> 0.13.0
> total DF
> RangeIndex(start=0, stop=4, step=1)
> Row group
> RangeIndex(start=0, stop=2, step=1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2667) [C++/Python] Add pandas-like take method to Array

2019-06-05 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856516#comment-16856516
 ] 

Joris Van den Bossche commented on ARROW-2667:
--

[~wesmckinn] you renamed this issue to only be about Array (and opened 
ARROW-5454 for the ChunkedArray part). So then this can be closed? (the python 
Array part was tackled in ARROW-5291)

> [C++/Python] Add pandas-like take method to Array
> -
>
> Key: ARROW-2667
> URL: https://issues.apache.org/jira/browse/ARROW-2667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a 
> list of indices and returns a reordered array.
> For reference, see Pandas' interface: 
> https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2667) [C++/Python] Add pandas-like take method to Array

2019-06-05 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856516#comment-16856516
 ] 

Joris Van den Bossche edited comment on ARROW-2667 at 6/5/19 8:55 AM:
--

[~wesmckinn] you renamed this issue to only be about Array (and opened 
ARROW-5454 for the ChunkedArray part). So then this can be closed? (the python 
Array part was tackled in ARROW-5291)

Edit: the other issue is only about C++, so we can keep this open for the 
Python side of course.


was (Author: jorisvandenbossche):
[~wesmckinn] you renamed this issue to only be about Array (and opened 
ARROW-5454 for the ChunkedArray part). So then this can be closed? (the python 
Array part was tackled in ARROW-5291)

> [C++/Python] Add pandas-like take method to Array
> -
>
> Key: ARROW-2667
> URL: https://issues.apache.org/jira/browse/ARROW-2667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a 
> list of indices and returns a reordered array.
> For reference, see Pandas' interface: 
> https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting

2019-06-05 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François resolved ARROW-5496.

Resolution: Fixed

Issue resolved by pull request 4464
[https://github.com/apache/arrow/pull/4464]

> [R][CI] Fix relative paths in R codecov.io reporting
> 
>
> Key: ARROW-5496
> URL: https://issues.apache.org/jira/browse/ARROW-5496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, 
> but due to an assumption in the coverage runner that the project would be at 
> the top level of the GitHub repository, the `r/` subdirectory was not 
> included, so R coverage stats were put in the wrong place, and detail files 
> (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) 
> return 404. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

2019-06-05 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856527#comment-16856527
 ] 

Joris Van den Bossche commented on ARROW-5450:
--

Thanks for the report!

The problem here is that pyarrow converts to pandas Timestamp objects, if 
pandas is installed (and otherwise to datetime.datetime objects). And pandas 
has a the limitation of only supporting timestamps in the limited ns range of 
1677 - 2262 
([http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits]).

We could catch the overflow error and in that case still return a 
datetime.datetime object. I personally don't really like this data-dependent 
behaviour, but we already have this pandas-available-dependent behaviour 
(alternatively, we could also always return datetime.datetime, or put the 
return of pandas Timestamps behind a keyword).

 

> [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too 
> large to convert to C long
> ---
>
> Key: ARROW-5450
> URL: https://issues.apache.org/jira/browse/ARROW-5450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Tim Swast
>Priority: Major
>
> When I attempt to roundtrip from a list of moderately large (beyond what can 
> be represented in nanosecond precision, but within microsecond precision) 
> datetime objects to pyarrow and back, I get an OverflowError: Python int too 
> large to convert to C long.
> pyarrow version:
> {noformat}
> $ pip freeze | grep pyarrow
> pyarrow==0.13.0{noformat}
>  
> Reproduction:
> {code:java}
> import datetime
> import pandas
> import pyarrow
> import pytz
> timestamp_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99, tzinfo=pytz.utc),
> datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> ]
> timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
> tz="UTC"))
> timestamp_roundtrip = timestamp_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 timestamp_roundtrip = timestamp_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}
> For good measure, I also tested with timezone-naive timestamps with the same 
> error:
> {code:java}
> naive_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99),
> datetime.datetime(1970, 1, 1, 0, 0, 0),
> ]
> naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
> naive_roundtrip = naive_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 naive_roundtrip = naive_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5452) [R] Add documentation website (pkgdown)

2019-06-05 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/ARROW-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain François resolved ARROW-5452.

Resolution: Fixed

Issue resolved by pull request 4419
[https://github.com/apache/arrow/pull/4419]

> [R] Add documentation website (pkgdown)
> ---
>
> Key: ARROW-5452
> URL: https://issues.apache.org/jira/browse/ARROW-5452
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> pkgdown ([https://pkgdown.r-lib.org/]) is the standard for R package 
> documentation websites. Build this for arrow and deploy it at 
> https://arrow.apache.org/docs/r.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5104) [Python/C++] Schema for empty tables include index column as integer

2019-06-05 Thread Joris Van den Bossche (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-5104:
-
Fix Version/s: 0.14.0

> [Python/C++] Schema for empty tables include index column as integer
> 
>
> Key: ARROW-5104
> URL: https://issues.apache.org/jira/browse/ARROW-5104
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.13.0
>Reporter: Florian Jetter
>Priority: Minor
> Fix For: 0.14.0
>
>
> The schema for an empty table/dataframe still includes the index as an 
> integer column instead of being serialized solely as a metadata reference 
> (see ARROW-1639)
> In the example below, the empty dataframe still holds `__index_level_0__` as 
> an integer column. Proper behavior would be to exclude it and reference the 
> index information in the pandas metadata as it is the case for a non-empty 
> column
> {code}
> In [1]: import pandas as pd
> im
> In [2]: import pyarrow as pa
> In [3]: non_empty =  pd.DataFrame({"col": [1]})
> In [4]: empty = non_empty.drop(0)
> In [5]: empty
> Out[5]:
> Empty DataFrame
> Columns: [col]
> Index: []
> In [6]: pa.Table.from_pandas(non_empty)
> Out[6]:
> pyarrow.Table
> col: int64
> metadata
> 
> OrderedDict([(b'pandas',
>   b'{"index_columns": [{"kind": "range", "name": null, "start": '
>   b'0, "stop": 1, "step": 1}], "column_indexes": [{"name": null,'
>   b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
>   b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
>   b'{"name": "col", "field_name": "col", "pandas_type": "int64",'
>   b' "numpy_type": "int64", "metadata": null}], "creator": {"lib'
>   b'rary": "pyarrow", "version": "0.13.0"}, "pandas_version": nu'
>   b'll}')])
> In [7]: pa.Table.from_pandas(empty)
> Out[7]:
> pyarrow.Table
> col: int64
> __index_level_0__: int64
> metadata
> 
> OrderedDict([(b'pandas',
>   b'{"index_columns": ["__index_level_0__"], "column_indexes": ['
>   b'{"name": null, "field_name": null, "pandas_type": "unicode",'
>   b' "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}]'
>   b', "columns": [{"name": "col", "field_name": "col", "pandas_t'
>   b'ype": "int64", "numpy_type": "int64", "metadata": null}, {"n'
>   b'ame": null, "field_name": "__index_level_0__", "pandas_type"'
>   b': "int64", "numpy_type": "int64", "metadata": null}], "creat'
>   b'or": {"library": "pyarrow", "version": "0.13.0"}, "pandas_ve'
>   b'rsion": null}')])
> In [8]: pa.__version__
> Out[8]: '0.13.0'
> In [9]: ! python --version
> Python 3.6.7
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5515) Ensure JVM to have sufficient capacity for large number of local reference

2019-06-05 Thread Yurui Zhou (JIRA)
Yurui Zhou created ARROW-5515:
-

 Summary: Ensure JVM to have sufficient capacity for large number 
of local reference
 Key: ARROW-5515
 URL: https://issues.apache.org/jira/browse/ARROW-5515
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Java
Reporter: Yurui Zhou
Assignee: Yurui Zhou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5515) Ensure JVM has sufficient capacity for large number of local reference

2019-06-05 Thread Yurui Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yurui Zhou updated ARROW-5515:
--
Summary: Ensure JVM has sufficient capacity for large number of local 
reference  (was: Ensure JVM to have sufficient capacity for large number of 
local reference)

> Ensure JVM has sufficient capacity for large number of local reference
> --
>
> Key: ARROW-5515
> URL: https://issues.apache.org/jira/browse/ARROW-5515
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++, Java
>Reporter: Yurui Zhou
>Assignee: Yurui Zhou
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported

2019-06-05 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-1774:
-

Assignee: Antoine Pitrou

> [C++] Add "view" function to create zero-copy views for compatible types, if 
> supported
> --
>
> Key: ARROW-1774
> URL: https://issues.apache.org/jira/browse/ARROW-1774
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> Similar to NumPy's {{ndarray.view}}, but with the restriction that the input 
> and output types have the same physical Arrow memory layout. This might be as 
> simple as adding a "zero copy only" option to the existing {{Cast}} kernel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5516) Development page for pyarrow has a missing dependency in using pip

2019-06-05 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created ARROW-5516:
---

 Summary: Development page for pyarrow has a missing dependency in 
using pip
 Key: ARROW-5516
 URL: https://issues.apache.org/jira/browse/ARROW-5516
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.13.0
Reporter: Hyukjin Kwon


{code}
ImportError while loading conftest 
'/.../arrow/python/pyarrow/tests/conftest.py'.
pyarrow/tests/conftest.py:20: in 
import hypothesis as h
E   ModuleNotFoundError: No module named 'hypothesis'
{code}

I followed the guide and seems it requires another dependency {{hypothesis}} 
than:

{code}
pip install six numpy pandas cython pytest
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5516) Development page for pyarrow has a missing dependency in using pip

2019-06-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5516:
--
Labels: pull-request-available  (was: )

> Development page for pyarrow has a missing dependency in using pip
> --
>
> Key: ARROW-5516
> URL: https://issues.apache.org/jira/browse/ARROW-5516
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> ImportError while loading conftest 
> '/.../arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/tests/conftest.py:20: in 
> import hypothesis as h
> E   ModuleNotFoundError: No module named 'hypothesis'
> {code}
> I followed the guide and seems it requires another dependency {{hypothesis}} 
> than:
> {code}
> pip install six numpy pandas cython pytest
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4723) Skip _files when reading a directory containing parquet files

2019-06-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4723:
--
Labels: parquet pull-request-available  (was: parquet)

> Skip _files when reading a directory containing parquet files
> -
>
> Key: ARROW-4723
> URL: https://issues.apache.org/jira/browse/ARROW-4723
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Hossein Falaki
>Priority: Major
>  Labels: parquet, pull-request-available
>
> It is common for Apache Spark or other big data platforms to save additional 
> meta-data files denoted with _ when saving parquet data.
> When using  {{make_batch_reader}} to load a directory saved by parquet 
> containing such files we encounter the following error:
> {code:java}
> PetastormMetadataError Traceback (most recent call last)
> /databricks/python/lib/python3.6/site-packages/petastorm/etl/dataset_metadata.py
>  in infer_or_load_unischema(dataset)
> 388 try:
> --> 389 return get_schema(dataset) 
> 390 except PetastormMetadataError:
> /databricks/python/lib/python3.6/site-packages/petastorm/etl/dataset_metadata.py
>  in get_schema(dataset)
> 342 raise PetastormMetadataError( 
> --> 343 'Could not find _common_metadata file. Use materialize_dataset(..) 
> in' 
> 344 ' petastorm.etl.dataset_metadata.py to generate this file in your ETL 
> code.'
> PetastormMetadataError: Could not find _common_metadata file. Use 
> materialize_dataset(..) in petastorm.etl.dataset_metadata.py to generate this 
> file in your ETL code. You can generate it on an existing dataset using 
> petastorm-generate-metadata.py{code}
>  
> This is because our Runtime stores the following two files at the end of the 
> job:
> {code:java}
> dbfs:/tmp/petastorm/_committed_4686077819843716563
> _committed_4686077819843716563  1965
> dbfs:/tmp/petastorm/_started_4686077819843716563{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5285:
---

Assignee: Antoine Pitrou  (was: shengjun.li)

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(&data, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reopened ARROW-5285:
-

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(&data, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-05 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856712#comment-16856712
 ] 

Wes McKinney commented on ARROW-5285:
-

We leave fixed issues in Resolved state

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(&data, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5285.
-
Resolution: Fixed

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(&data, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions

2019-06-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4787:
--
Labels: pull-request-available  (was: )

> [C++] Include "null" values (perhaps with an option to toggle on/off) in hash 
> kernel actions
> 
>
> Key: ARROW-4787
> URL: https://issues.apache.org/jira/browse/ARROW-4787
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> Null is a meaningful value in the context of analytics. We should have the 
> option of considering it distinctly in e.g. {{ValueCounts}} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2667) [C++/Python] Add pandas-like take method to Array

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2667:

Fix Version/s: (was: 0.14.0)

> [C++/Python] Add pandas-like take method to Array
> -
>
> Key: ARROW-2667
> URL: https://issues.apache.org/jira/browse/ARROW-2667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>
> We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a 
> list of indices and returns a reordered array.
> For reference, see Pandas' interface: 
> https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2667) [C++/Python] Add pandas-like take method to Array

2019-06-05 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856749#comment-16856749
 ] 

Wes McKinney commented on ARROW-2667:
-

Yes, going ahead and closing

> [C++/Python] Add pandas-like take method to Array
> -
>
> Key: ARROW-2667
> URL: https://issues.apache.org/jira/browse/ARROW-2667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a 
> list of indices and returns a reordered array.
> For reference, see Pandas' interface: 
> https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2667) [C++/Python] Add pandas-like take method to Array

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2667.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

> [C++/Python] Add pandas-like take method to Array
> -
>
> Key: ARROW-2667
> URL: https://issues.apache.org/jira/browse/ARROW-2667
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a 
> list of indices and returns a reordered array.
> For reference, see Pandas' interface: 
> https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5516) [Python] Development page for pyarrow has a missing dependency in using pip

2019-06-05 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5516:

Summary: [Python] Development page for pyarrow has a missing dependency in 
using pip  (was: Development page for pyarrow has a missing dependency in using 
pip)

> [Python] Development page for pyarrow has a missing dependency in using pip
> ---
>
> Key: ARROW-5516
> URL: https://issues.apache.org/jira/browse/ARROW-5516
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> ImportError while loading conftest 
> '/.../arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/tests/conftest.py:20: in 
> import hypothesis as h
> E   ModuleNotFoundError: No module named 'hypothesis'
> {code}
> I followed the guide and seems it requires another dependency {{hypothesis}} 
> than:
> {code}
> pip install six numpy pandas cython pytest
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5517) [C++] Header collection CMake logic should only consider filename without directory included

2019-06-05 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5517:
---

 Summary: [C++] Header collection CMake logic should only consider 
filename without directory included
 Key: ARROW-5517
 URL: https://issues.apache.org/jira/browse/ARROW-5517
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.14.0


If "internal" is in the directory name then all headers are currently excluded

See report at https://github.com/apache/arrow/issues/4469



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5518) Set VectorSchemaRoot rowCount to 0 on allocateNew and clear

2019-06-05 Thread Johannes Luong (JIRA)
Johannes Luong created ARROW-5518:
-

 Summary: Set VectorSchemaRoot rowCount to 0 on allocateNew and 
clear 
 Key: ARROW-5518
 URL: https://issues.apache.org/jira/browse/ARROW-5518
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Johannes Luong


Set {{VectorSchemaRoot::rowCount}} to 0 in {{allocateNew()}} and {{clear()}}. 
This makes the behaviour of {{VectorSchemaRoot}} consistent with 
{{ValueVector}} implementations which set their {{valueCount}} to 0 on 
{{clear()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5518) [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear

2019-06-05 Thread Johannes Luong (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Luong updated ARROW-5518:
--
Summary: [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
  (was: Set VectorSchemaRoot rowCount to 0 on allocateNew and clear )

> [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
> ---
>
> Key: ARROW-5518
> URL: https://issues.apache.org/jira/browse/ARROW-5518
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Johannes Luong
>Priority: Minor
>  Labels: newbie, patch, pull-request-available
>
> Set {{VectorSchemaRoot::rowCount}} to 0 in {{allocateNew()}} and {{clear()}}. 
> This makes the behaviour of {{VectorSchemaRoot}} consistent with 
> {{ValueVector}} implementations which set their {{valueCount}} to 0 on 
> {{clear()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5518) [Java] set VectorSchemaRoot rowCount to 0 on allocateNew and clear

2019-06-05 Thread Johannes Luong (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Luong updated ARROW-5518:
--
Summary: [Java] set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
  (was: [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear )

> [Java] set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
> ---
>
> Key: ARROW-5518
> URL: https://issues.apache.org/jira/browse/ARROW-5518
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Johannes Luong
>Priority: Minor
>  Labels: newbie, patch, pull-request-available
>
> Set {{VectorSchemaRoot::rowCount}} to 0 in {{allocateNew()}} and {{clear()}}. 
> This makes the behaviour of {{VectorSchemaRoot}} consistent with 
> {{ValueVector}} implementations which set their {{valueCount}} to 0 on 
> {{clear()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5518) [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear

2019-06-05 Thread Johannes Luong (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johannes Luong updated ARROW-5518:
--
Summary: [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
  (was: [Java] set VectorSchemaRoot rowCount to 0 on allocateNew and clear )

> [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear 
> ---
>
> Key: ARROW-5518
> URL: https://issues.apache.org/jira/browse/ARROW-5518
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Johannes Luong
>Priority: Minor
>  Labels: newbie, patch, pull-request-available
>
> Set {{VectorSchemaRoot::rowCount}} to 0 in {{allocateNew()}} and {{clear()}}. 
> This makes the behaviour of {{VectorSchemaRoot}} consistent with 
> {{ValueVector}} implementations which set their {{valueCount}} to 0 on 
> {{clear()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported

2019-06-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1774:
--
Labels: pull-request-available  (was: )

> [C++] Add "view" function to create zero-copy views for compatible types, if 
> supported
> --
>
> Key: ARROW-1774
> URL: https://issues.apache.org/jira/browse/ARROW-1774
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Similar to NumPy's {{ndarray.view}}, but with the restriction that the input 
> and output types have the same physical Arrow memory layout. This might be as 
> simple as adding a "zero copy only" option to the existing {{Cast}} kernel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5092) [C#] Source Link doesn't work with the C# release script

2019-06-05 Thread Yosuke Shiro (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yosuke Shiro reassigned ARROW-5092:
---

Assignee: Yosuke Shiro

> [C#] Source Link doesn't work with the C# release script
> 
>
> Key: ARROW-5092
> URL: https://issues.apache.org/jira/browse/ARROW-5092
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C#
>Affects Versions: 0.13.0
>Reporter: Eric Erhardt
>Assignee: Yosuke Shiro
>Priority: Major
>
> With the 0.13.0 C# NuGet package, [Source 
> Link|https://docs.microsoft.com/en-us/dotnet/standard/library-guidance/sourcelink]
>  doesn't work. The symbols can be downloaded from nuget.org correctly, but 
> when Visual Studio tries to download the code, it cannot find the correct 
> files.
> The following is why it doesn't work:
> The .NET tooling expects the build of an official release to happen in the 
> context of a {{git}} repository. This does 2 things to the produced assets:
>  # In the {{.nupkg}} file that is generated, the .NET tooling will encode the 
> current git commit's SHA hash into both the {{Apache.Arrow.nuspec}} file, and 
> into the compiled {{Apache.Arrow.dll}} assembly. Looking at the released 
> version that was published over the weekend: 
> [https://www.nuget.org/packages/Apache.Arrow/0.13.0], this information made 
> it into the {{.nuspec}} and the {{.dll}}:
> {code}
> [assembly: 
> AssemblyInformationalVersion("0.13.0+57de5c3adffe526f37366bb15c3ff0d4a2e84655")]
> https://github.com/apache/arrow"; 
> commit="57de5c3adffe526f37366bb15c3ff0d4a2e84655" />
> {code}
> However, I don't see how the [C# release 
> script|https://github.com/apache/arrow/blob/master/dev/release/post-06-csharp.sh]
>  could have done that. 
>  # Also, .NET has a feature called "Source Link", which allows for the source 
> code to be automatically downloaded from GitHub when debugging into this 
> library. The way the tooling works today, it requires that the git 
> repository's {{origin}} remote is set to 
> [https://github.com/apache/arrow.git]. The tooling reads uses the `origin` 
> git remote to encode the GitHub URL into the symbols file in the {{.snupkg}} 
> file.
> This, however, doesn't work with the 0.13.0 release that occurred over the 
> weekend. I tried using the Source Link feature, and it didn't automatically 
> download the source files from GitHub.
> Looking into the symbols file, I see the Source Link information that was 
> embedded:
> {code}
> 1: 
> '/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBuffer.cs'
>  (#19c)C# (#3)   SHA-1 (#2) 
> 04-64-A0-48-82-EA-F5-B5-50-EC-CA-9F-85-75-E2-95-A4-EC-AB-B3 (#1b7)   
> 2: 
> '/home/kou/work/cpp/arrow.kou/apache-arrow-0.13.0/csharp/src/Apache.Arrow/Flatbuf/FlatBuffers/ByteBufferUtil.cs'
>  (#68f)C# (#3)   SHA-1 (#2) 
> F0-4F-28-53-88-A4-E0-6E-F1-1F-17-F6-CD-FE-0E-64-AB-0B-C2-95 (#6aa)   
> {code}
> {code:json}
> {
> "documents": {
> "/home/kou/work/cpp/arrow.kou/*": 
> "https://raw.githubusercontent.com/kou/arrow/57de5c3adffe526f37366bb15c3ff0d4a2e84655/*";,
> "/home/kou/work/cpp/arrow.kou/cpp/submodules/parquet-testing/*": 
> "https://raw.githubusercontent.com/apache/parquet-testing/bb7b6abbb3fbeff845646364a4286142127be04c/*";
> }
> }
> {code}
> Here it appears the {{origin}} remote was set to {{kou/arrow}}, and not 
> {{apache/arrow}}. Also, it appears the {{apache-arrow-0.13.0}} folder was 
> under a git repository, and so the sources aren't matched up with the git 
> repository. (Basically that folder shouldn't have appeared in the Documents 
> list that has the {{.cs}} file path.) I think this explains how (1) above 
> happened - the build was under a git repository - but this script downloaded 
> an extra copy of the sources into that git repository.
> I'm wondering how we can fix either this script, or the .NET Tooling, or 
> both, to make this experience better for the next release. I think we need to 
> ensure two things:
>  # The git commit information is set correctly in the {{.nuspec}} and the 
> {{.dll}} when the release build is run. I think it just happened by pure luck 
> this time. It just so happened that the script was executed in an already 
> established repo, and it just so happened to be on the right commit (or maybe 
> it wasn't the right commit?).
>  # The source link information is set correctly in the symbols file.
> [~wesmckinn] [~kou]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5516) [Python] Development page for pyarrow has a missing dependency in using pip

2019-06-05 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5516.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4477
[https://github.com/apache/arrow/pull/4477]

> [Python] Development page for pyarrow has a missing dependency in using pip
> ---
>
> Key: ARROW-5516
> URL: https://issues.apache.org/jira/browse/ARROW-5516
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.13.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> ImportError while loading conftest 
> '/.../arrow/python/pyarrow/tests/conftest.py'.
> pyarrow/tests/conftest.py:20: in 
> import hypothesis as h
> E   ModuleNotFoundError: No module named 'hypothesis'
> {code}
> I followed the guide and seems it requires another dependency {{hypothesis}} 
> than:
> {code}
> pip install six numpy pandas cython pytest
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule

2019-06-05 Thread Praveen Kumar Desabandu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856913#comment-16856913
 ] 

Praveen Kumar Desabandu commented on ARROW-4301:


[~wesmckinn] i guess the problem is that Gandiva is not part of the default 
maven profile, to include Gandiva we would need run maven using Gandiva profile 
like the following

mvn release:prepare -Dtag=apache-arrow-0.14.0 -DreleaseVersion=0.14.0 
-DautoVersionSubmodules -DdevelopmentVersion=0.15.0-SNAPSHOT -p gandiva

Please note the profile added at the end.

I could not run the commands [~kou] mentioned since i do not have the gpg keys 
setup but my guess is that it should work.

Also note that you will have the same issue for the new ORC adapter being 
introduced for doing ORC reads natively.

> [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva 
> submodule
> ---
>
> Key: ARROW-4301
> URL: https://issues.apache.org/jira/browse/ARROW-4301
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva, Java
>Reporter: Wes McKinney
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See 
> https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550.
>  This is breaking the build so I'm going to patch manually



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5480) [Python] Pandas categorical type doesn't survive a round-trip through parquet

2019-06-05 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856967#comment-16856967
 ] 

Joris Van den Bossche commented on ARROW-5480:
--

[~wesmckinn] I think this can be closed as duplicate of the other issue?

> [Python] Pandas categorical type doesn't survive a round-trip through parquet
> -
>
> Key: ARROW-5480
> URL: https://issues.apache.org/jira/browse/ARROW-5480
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1, 0.13.0
> Environment: python: 3.7.3.final.0
> python-bits: 64
> OS: Linux
> OS-release: 5.0.0-15-generic
> machine: x86_64
> processor: x86_64
> byteorder: little
> pandas: 0.24.2
> numpy: 1.16.4
> pyarrow: 0.13.0
>Reporter: Karl Dunkle Werner
>Priority: Minor
>
> Writing a string categorical variable to from pandas parquet is read back as 
> string (object dtype). I expected it to be read as category.
> The same thing happens if the category is numeric -- a numeric category is 
> read back as int64.
> In the code below, I tried out an in-memory arrow Table, which successfully 
> translates categories back to pandas. However, when I write to a parquet 
> file, it's not.
> In the scheme of things, this isn't a big deal, but it's a small surprise.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'x': pd.Categorical(['a', 'a', 'b', 'b'])})
> df.dtypes  # category
> # This works:
> pa.Table.from_pandas(df).to_pandas().dtypes  # category
> df.to_parquet("categories.parquet")
> # This reads back object, but I expected category
> pd.read_parquet("categories.parquet").dtypes  # object
> # Numeric categories have the same issue:
> df_num = pd.DataFrame({'x': pd.Categorical([1, 1, 2, 2])})
> df_num.dtypes # category
> pa.Table.from_pandas(df_num).to_pandas().dtypes  # category
> df_num.to_parquet("categories_num.parquet")
> # This reads back int64, but I expected category
> pd.read_parquet("categories_num.parquet").dtypes  # int64
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5480) [Python] Pandas categorical type doesn't survive a round-trip through parquet

2019-06-05 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856970#comment-16856970
 ] 

Wes McKinney commented on ARROW-5480:
-

I'm not sure -- I think the scope of work in ARROW-3246 may be slightly 
different. I'd like to look at the Parquet-Categorical stuff sometime this 
month so I can look at both issues more closely then

> [Python] Pandas categorical type doesn't survive a round-trip through parquet
> -
>
> Key: ARROW-5480
> URL: https://issues.apache.org/jira/browse/ARROW-5480
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1, 0.13.0
> Environment: python: 3.7.3.final.0
> python-bits: 64
> OS: Linux
> OS-release: 5.0.0-15-generic
> machine: x86_64
> processor: x86_64
> byteorder: little
> pandas: 0.24.2
> numpy: 1.16.4
> pyarrow: 0.13.0
>Reporter: Karl Dunkle Werner
>Priority: Minor
>
> Writing a string categorical variable to from pandas parquet is read back as 
> string (object dtype). I expected it to be read as category.
> The same thing happens if the category is numeric -- a numeric category is 
> read back as int64.
> In the code below, I tried out an in-memory arrow Table, which successfully 
> translates categories back to pandas. However, when I write to a parquet 
> file, it's not.
> In the scheme of things, this isn't a big deal, but it's a small surprise.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'x': pd.Categorical(['a', 'a', 'b', 'b'])})
> df.dtypes  # category
> # This works:
> pa.Table.from_pandas(df).to_pandas().dtypes  # category
> df.to_parquet("categories.parquet")
> # This reads back object, but I expected category
> pd.read_parquet("categories.parquet").dtypes  # object
> # Numeric categories have the same issue:
> df_num = pd.DataFrame({'x': pd.Categorical([1, 1, 2, 2])})
> df_num.dtypes # category
> pa.Table.from_pandas(df_num).to_pandas().dtypes  # category
> df_num.to_parquet("categories_num.parquet")
> # This reads back int64, but I expected category
> pd.read_parquet("categories_num.parquet").dtypes  # int64
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

2019-06-05 Thread Tim Swast (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857049#comment-16857049
 ] 

Tim Swast commented on ARROW-5450:
--

Since datetime.datetime objects don't support nanosecond precision, pandas 
Timestamp is a good default with nanosecond precision columns. But with 
microsecond precision objects, I'd always prefer a datetime.datetime object.

> [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too 
> large to convert to C long
> ---
>
> Key: ARROW-5450
> URL: https://issues.apache.org/jira/browse/ARROW-5450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Tim Swast
>Priority: Major
>
> When I attempt to roundtrip from a list of moderately large (beyond what can 
> be represented in nanosecond precision, but within microsecond precision) 
> datetime objects to pyarrow and back, I get an OverflowError: Python int too 
> large to convert to C long.
> pyarrow version:
> {noformat}
> $ pip freeze | grep pyarrow
> pyarrow==0.13.0{noformat}
>  
> Reproduction:
> {code:java}
> import datetime
> import pandas
> import pyarrow
> import pytz
> timestamp_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99, tzinfo=pytz.utc),
> datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> ]
> timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
> tz="UTC"))
> timestamp_roundtrip = timestamp_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 timestamp_roundtrip = timestamp_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}
> For good measure, I also tested with timezone-naive timestamps with the same 
> error:
> {code:java}
> naive_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99),
> datetime.datetime(1970, 1, 1, 0, 0, 0),
> ]
> naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
> naive_roundtrip = naive_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 naive_roundtrip = naive_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

2019-06-05 Thread Tim Swast (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857049#comment-16857049
 ] 

Tim Swast edited comment on ARROW-5450 at 6/5/19 9:24 PM:
--

Since datetime.datetime objects don't support nanosecond precision, pandas 
Timestamp is a good default with nanosecond precision columns. But with 
microsecond precision columns, I'd always prefer a datetime.datetime object.


was (Author: tswast):
Since datetime.datetime objects don't support nanosecond precision, pandas 
Timestamp is a good default with nanosecond precision columns. But with 
microsecond precision objects, I'd always prefer a datetime.datetime object.

> [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too 
> large to convert to C long
> ---
>
> Key: ARROW-5450
> URL: https://issues.apache.org/jira/browse/ARROW-5450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Tim Swast
>Priority: Major
>
> When I attempt to roundtrip from a list of moderately large (beyond what can 
> be represented in nanosecond precision, but within microsecond precision) 
> datetime objects to pyarrow and back, I get an OverflowError: Python int too 
> large to convert to C long.
> pyarrow version:
> {noformat}
> $ pip freeze | grep pyarrow
> pyarrow==0.13.0{noformat}
>  
> Reproduction:
> {code:java}
> import datetime
> import pandas
> import pyarrow
> import pytz
> timestamp_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99, tzinfo=pytz.utc),
> datetime.datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc),
> ]
> timestamp_array = pyarrow.array(timestamp_rows, pyarrow.timestamp("us", 
> tz="UTC"))
> timestamp_roundtrip = timestamp_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 timestamp_roundtrip = timestamp_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}
> For good measure, I also tested with timezone-naive timestamps with the same 
> error:
> {code:java}
> naive_rows = [
> datetime.datetime(1, 1, 1, 0, 0, 0),
> None,
> datetime.datetime(, 12, 31, 23, 59, 59, 99),
> datetime.datetime(1970, 1, 1, 0, 0, 0),
> ]
> naive_array = pyarrow.array(naive_rows, pyarrow.timestamp("us", tz=None))
> naive_roundtrip = naive_array.to_pylist()
> # ---
> # OverflowError Traceback (most recent call last)
> #  in 
> # > 1 naive_roundtrip = naive_array.to_pylist()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/array.pxi
>  in __iter__()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib.TimestampValue.as_py()
> #
> # 
> ~/.pyenv/versions/3.6.4/envs/scratch/lib/python3.6/site-packages/pyarrow/scalar.pxi
>  in pyarrow.lib._datetime_conversion_functions.lambda5()
> #
> # pandas/_libs/tslibs/timestamps.pyx in 
> pandas._libs.tslibs.timestamps.Timestamp.__new__()
> #
> # pandas/_libs/tslibs/conversion.pyx in 
> pandas._libs.tslibs.conversion.convert_to_tsobject()
> #
> # OverflowError: Python int too large to convert to C long
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Description: 
https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
config. Adding the wiring into the apidocs build scripts was deferred because 
there was some discussion about which workflow was supported and which was 
deprecated.  

Uwe says: "Have a look at 
[https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
[https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
that and a docs-r entry in the main {{docker-compose.yml}} should be sufficient 
to get it running in the docker setup. But actually I would rather like to see 
that we also add the R build to the above linked files."

  was:https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
config. Adding the wiring into the apidocs build scripts was deferred because 
there was some discussion about which workflow was supported and which was 
deprecated.  


> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857089#comment-16857089
 ] 

Neal Richardson commented on ARROW-5497:


[~xhochy] (bringing discussion over here from the old PR): maybe I'm missing 
it, but I don't see in those where the Java and JS docs get built and added to 
the site. That sounds like the part of the process where the R docs need to be 
added.

> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4972) [Go] Array equality

2019-06-05 Thread Alexandre Crayssac (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857090#comment-16857090
 ] 

Alexandre Crayssac commented on ARROW-4972:
---

No no, feel free to go ahead.

I have some work in progress in my source tree, I can push it if you want.

> [Go] Array equality
> ---
>
> Key: ARROW-4972
> URL: https://issues.apache.org/jira/browse/ARROW-4972
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Go
>Reporter: Alexandre Crayssac
>Assignee: Alexandre Crayssac
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857095#comment-16857095
 ] 

Uwe L. Korn commented on ARROW-5497:


I'm not sure whether JS and Java docs currently get build at all. The 
{{gen_apidocs}} broke at some time and the solution was to migrate everything 
to the main {{docker-compose.yml}} but just hasn't happened yet.

> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Component/s: JavaScript
 Java

> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, Java, JavaScript, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [Release] Build and publish R/Java/JS docs

2019-06-05 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Summary: [Release] Build and publish R/Java/JS docs  (was: [R][Release] 
Build and publish R package docs)

> [Release] Build and publish R/Java/JS docs
> --
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, Java, JavaScript, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-05 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857098#comment-16857098
 ] 

Neal Richardson commented on ARROW-5497:


I see. [https://github.com/apache/arrow-site/tree/asf-site/docs] shows that 
neither were built with the 0.13 release. 

I'll broaden the scope of this ticket to fix all three in the preferred docker 
setup.

> [R][Release] Build and publish R package docs
> -
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [Release] Build and publish R/Java/JS docs

2019-06-05 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Priority: Blocker  (was: Major)

> [Release] Build and publish R/Java/JS docs
> --
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, Java, JavaScript, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Edit: this ticket was originally just about adding the R package docs, but it 
> seems that the JS and Java docs aren't getting built as part of the release 
> process anymore, so that needs to be fixed.
>  
> Original description:
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5497) [Release] Build and publish R/Java/JS docs

2019-06-05 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5497:
---
Description: 
Edit: this ticket was originally just about adding the R package docs, but it 
seems that the JS and Java docs aren't getting built as part of the release 
process anymore, so that needs to be fixed.

 

Original description:

https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
config. Adding the wiring into the apidocs build scripts was deferred because 
there was some discussion about which workflow was supported and which was 
deprecated.  

Uwe says: "Have a look at 
[https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
[https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
that and a docs-r entry in the main {{docker-compose.yml}} should be sufficient 
to get it running in the docker setup. But actually I would rather like to see 
that we also add the R build to the above linked files."

  was:
https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
config. Adding the wiring into the apidocs build scripts was deferred because 
there was some discussion about which workflow was supported and which was 
deprecated.  

Uwe says: "Have a look at 
[https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
[https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
that and a docs-r entry in the main {{docker-compose.yml}} should be sufficient 
to get it running in the docker setup. But actually I would rather like to see 
that we also add the R build to the above linked files."


> [Release] Build and publish R/Java/JS docs
> --
>
> Key: ARROW-5497
> URL: https://issues.apache.org/jira/browse/ARROW-5497
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools, Documentation, Java, JavaScript, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> Edit: this ticket was originally just about adding the R package docs, but it 
> seems that the JS and Java docs aren't getting built as part of the release 
> process anymore, so that needs to be fixed.
>  
> Original description:
> https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
> config. Adding the wiring into the apidocs build scripts was deferred because 
> there was some discussion about which workflow was supported and which was 
> deprecated.  
> Uwe says: "Have a look at 
> [https://github.com/apache/arrow/blob/master/docs/Dockerfile] and 
> [https://github.com/apache/arrow/blob/master/ci/docker_build_sphinx.sh] Add 
> that and a docs-r entry in the main {{docker-compose.yml}} should be 
> sufficient to get it running in the docker setup. But actually I would rather 
> like to see that we also add the R build to the above linked files."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4990) [C++] Kernel to compare array with array

2019-06-05 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-4990.

Resolution: Fixed

Issue resolved by pull request 4398
[https://github.com/apache/arrow/pull/4398]

> [C++] Kernel to compare array with array
> 
>
> Key: ARROW-4990
> URL: https://issues.apache.org/jira/browse/ARROW-4990
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5512) [C++] Draft initial public APIs for Datasets project

2019-06-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5512:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++] Draft initial public APIs for Datasets project
> 
>
> Key: ARROW-5512
> URL: https://issues.apache.org/jira/browse/ARROW-5512
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 0.14.0
>
>
> The objective of this is to ensure general alignment with the discussion 
> document
> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing
> so that an initial working implementation can begin to take place



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5519) Add ORC JNI related components to travis CI

2019-06-05 Thread Yurui Zhou (JIRA)
Yurui Zhou created ARROW-5519:
-

 Summary: Add ORC JNI related components to travis CI
 Key: ARROW-5519
 URL: https://issues.apache.org/jira/browse/ARROW-5519
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Java
Reporter: Yurui Zhou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5519) Add ORC JNI related components to travis CI

2019-06-05 Thread Yurui Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yurui Zhou reassigned ARROW-5519:
-

Assignee: Yurui Zhou

> Add ORC JNI related components to travis CI
> ---
>
> Key: ARROW-5519
> URL: https://issues.apache.org/jira/browse/ARROW-5519
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Java
>Reporter: Yurui Zhou
>Assignee: Yurui Zhou
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)