[jira] [Assigned] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-3571:
--

Assignee: Krisztian Szucs

> [Wiki] Release management guide does not explain how to set up Crossbow or 
> where to find instructions
> -
>
> Key: ARROW-3571
> URL: https://issues.apache.org/jira/browse/ARROW-3571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Wiki
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> If you follow the guide, at one point it says "Launch a Crossbow build" but 
> provides no link to the setup instructions for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4309) [Release] gen_apidocs docker-compose task is out of date

2019-05-31 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853147#comment-16853147
 ] 

Neal Richardson commented on ARROW-4309:


We need CUDA support to build API documentation?

> [Release] gen_apidocs docker-compose task is out of date
> 
>
> Key: ARROW-4309
> URL: https://issues.apache.org/jira/browse/ARROW-4309
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools, Documentation
>Reporter: Wes McKinney
>Priority: Major
>  Labels: docker
>
> This needs to be updated to build with CUDA support (which in turn will 
> require the host machine to have nvidia-docker), among other things



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3758) [R] Build R library on Windows, document build instructions for Windows developers

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-3758:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

> [R] Build R library on Windows, document build instructions for Windows 
> developers
> --
>
> Key: ARROW-3758
> URL: https://issues.apache.org/jira/browse/ARROW-3758
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3758) [R] Build R library on Windows, document build instructions for Windows developers

2019-05-31 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson reassigned ARROW-3758:
--

Assignee: Neal Richardson

> [R] Build R library on Windows, document build instructions for Windows 
> developers
> --
>
> Key: ARROW-3758
> URL: https://issues.apache.org/jira/browse/ARROW-3758
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3294) [C++] Test Flight RPC on Windows / Appveyor

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3294.
-
Resolution: Fixed

Issue resolved by pull request 4410
[https://github.com/apache/arrow/pull/4410]

> [C++] Test Flight RPC on Windows / Appveyor
> ---
>
> Key: ARROW-3294
> URL: https://issues.apache.org/jira/browse/ARROW-3294
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: flight, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5464) [Archery] Bad --benchmark-filter default

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5464.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4434
[https://github.com/apache/arrow/pull/4434]

> [Archery] Bad --benchmark-filter default
> 
>
> Key: ARROW-5464
> URL: https://issues.apache.org/jira/browse/ARROW-5464
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5469) [Go] implement read/write IPC for Date32/Date64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5469:
--

 Summary: [Go] implement read/write IPC for Date32/Date64 arrays
 Key: ARROW-5469
 URL: https://issues.apache.org/jira/browse/ARROW-5469
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5467) [Go] implement read/write IPC for Time32/Time64 arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5467:
--

 Summary: [Go] implement read/write IPC for Time32/Time64 arrays
 Key: ARROW-5467
 URL: https://issues.apache.org/jira/browse/ARROW-5467
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5468) [Go] implement read/write IPC for Timestamp arrays

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5468:
--

 Summary: [Go] implement read/write IPC for Timestamp arrays
 Key: ARROW-5468
 URL: https://issues.apache.org/jira/browse/ARROW-5468
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5435) [Java] IntervalYearVector#getObject should return Period with both year and month

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5435:

Summary: [Java] IntervalYearVector#getObject should return Period with both 
year and month  (was: IntervalYearVector#getObject should return Period with 
both year and month)

> [Java] IntervalYearVector#getObject should return Period with both year and 
> month
> -
>
> Key: ARROW-5435
> URL: https://issues.apache.org/jira/browse/ARROW-5435
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Ji Liu
>Assignee: Ji Liu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> IntervalYearVector#getObject today return Period with specific month. 
> However, this vector stores interval (years and months, e.g. 2 years and 3 
> months is stored as 27(total months)), it should return Period with both 
> years and months(now only months is assigned). 
> As shown in the example above, now it return Period(27 months), I think it 
> should return Period(2 years, 3 months).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5440) [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5440:

Summary: [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on 
centos  (was: Rust Parquet requiring libstd-xxx.so dependency on centos)

> [Rust][Parquet] Rust Parquet requiring libstd-xxx.so dependency on centos
> -
>
> Key: ARROW-5440
> URL: https://issues.apache.org/jira/browse/ARROW-5440
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: CentOS Linux release 7.6.1810 (Core) 
>Reporter: Tenzin Rigden
>Priority: Major
> Attachments: parquet-test-libstd.tar.gz
>
>
> Hello,
> In the rust parquet implementation ([https://github.com/sunchao/parquet-rs]) 
> on centos, the binary created has a `libstd-hash.so` shared library 
> dependency that is causing issues since it's a shared library found in the 
> rustup directory. This `libstd-hash.so` dependency isn't there on any other 
> rust binaries I've made before. This dependency means that I can't run this 
> binary anywhere where rustup isn't installed with that exact libstd library.
> This is not an issue on Mac.
> I've attached the rust files and here is the command line output below.
> {code:java|title=cli-output|borderStyle=solid}
> [centos@_ parquet-test]$ cat /etc/centos-release
> CentOS Linux release 7.6.1810 (Core)
> [centos@_ parquet-test]$ rustc --version
> rustc 1.36.0-nightly (e70d5386d 2019-05-27)
> [centos@_ parquet-test]$ ldd target/release/parquet-test
> linux-vdso.so.1 =>  (0x7ffd02fee000)
> libstd-44988553032616b2.so => not found
> librt.so.1 => /lib64/librt.so.1 (0x7f6ecd209000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f6eccfed000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f6eccdd7000)
> libc.so.6 => /lib64/libc.so.6 (0x7f6ecca0a000)
> libm.so.6 => /lib64/libm.so.6 (0x7f6ecc708000)
> /lib64/ld-linux-x86-64.so.2 (0x7f6ecd8b1000)
> [centos@_ parquet-test]$ ls -l 
> ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> -rw-r--r--. 1 centos centos 5623568 May 27 21:46 
> /home/centos/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libstd-44988553032616b2.so
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5466) [Java] Combine Java CI builds into a common build with multiple JDKs

2019-05-31 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5466:
---

 Summary: [Java] Combine Java CI builds into a common build with 
multiple JDKs
 Key: ARROW-5466
 URL: https://issues.apache.org/jira/browse/ARROW-5466
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Wes McKinney
 Fix For: 0.14.0


The JDK 9 and 11 builds are fast -- 4 minutes each. It would probably be more 
efficient to run all 3 JDK builds in a single build entry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5465) [Crossbow] Support writing job definition to a file on submit

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5465:
--
Labels: pull-request-available  (was: )

> [Crossbow] Support writing job definition to a file on submit 
> --
>
> Key: ARROW-5465
> URL: https://issues.apache.org/jira/browse/ARROW-5465
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> In similar fashion like archery benchmark does. Required to consume the 
> command's output from a buildbot build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5465) [Crossbow] Support writing job definition to a file on submit

2019-05-31 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5465:
--

 Summary: [Crossbow] Support writing job definition to a file on 
submit 
 Key: ARROW-5465
 URL: https://issues.apache.org/jira/browse/ARROW-5465
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


In similar fashion like archery benchmark does. Required to consume the 
command's output from a buildbot build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4419) [Flight] Test Flight servers and clients with a generic gRPC services

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4419:

Fix Version/s: (was: 0.14.0)

> [Flight] Test Flight servers and clients with a generic gRPC services
> -
>
> Key: ARROW-4419
> URL: https://issues.apache.org/jira/browse/ARROW-4419
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: David Li
>Priority: Minor
>  Labels: flight
>
> The Java implementation will fail to decode a schema message if the message 
> also contains (empty) body buffers (see ArrowMessage.asSchema's precondition 
> checks). However, clients using default Protobuf serialization will likely 
> write an empty body buffer by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4419) [Flight] Test Flight servers and clients with a generic gRPC services

2019-05-31 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4419:

Summary: [Flight] Test Flight servers and clients with a generic gRPC 
services  (was: [Flight] Deal with body buffers in FlightData)

> [Flight] Test Flight servers and clients with a generic gRPC services
> -
>
> Key: ARROW-4419
> URL: https://issues.apache.org/jira/browse/ARROW-4419
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: David Li
>Priority: Minor
>  Labels: flight
> Fix For: 0.14.0
>
>
> The Java implementation will fail to decode a schema message if the message 
> also contains (empty) body buffers (see ArrowMessage.asSchema's precondition 
> checks). However, clients using default Protobuf serialization will likely 
> write an empty body buffer by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3978) [C++] Implement hashing, dictionary-encoding for StructArray

2019-05-31 Thread Francois Saint-Jacques (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853069#comment-16853069
 ] 

Francois Saint-Jacques commented on ARROW-3978:
---

ClickHouse also uses the pivot method, see
 * 
[https://github.com/yandex/ClickHouse/blob/d4f474cd196c1b7dff65f8507e87b380f64e2b53/dbms/src/Common/ColumnsHashing.h#L513-L540]
 * 
https://github.com/yandex/ClickHouse/blob/d4f474cd196c1b7dff65f8507e87b380f64e2b53/dbms/src/Interpreters/AggregationCommon.h#L231-L243
 * 
https://github.com/yandex/ClickHouse/blob/d4f474cd196c1b7dff65f8507e87b380f64e2b53/dbms/src/Columns/ColumnVector.cpp#L32-L45

> [C++] Implement hashing, dictionary-encoding for StructArray
> 
>
> Key: ARROW-3978
> URL: https://issues.apache.org/jira/browse/ARROW-3978
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This is a central requirement for hash-aggregations such as
> {code}
> SELECT AGG_FUNCTION(expr)
> FROM table
> GROUP BY expr1, expr2, ...
> {code}
> The materialized keys in the GROUP BY section form a struct, which can be 
> incrementally hashed to produce dictionary codes suitable for computing 
> aggregates or any other purpose. 
> There are a few subtasks related to this, such as efficiently constructing a 
> record (that can be hashed quickly) to identify each "row" in the struct. 
> Maybe we should start with that first



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5464) [Archery] Bad --benchmark-filter default

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5464:
--
Labels: pull-request-available  (was: )

> [Archery] Bad --benchmark-filter default
> 
>
> Key: ARROW-5464
> URL: https://issues.apache.org/jira/browse/ARROW-5464
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5464) [Archery] Bad --benchmark-filter default

2019-05-31 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5464:
-

 Summary: [Archery] Bad --benchmark-filter default
 Key: ARROW-5464
 URL: https://issues.apache.org/jira/browse/ARROW-5464
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-05-31 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5463:
-

 Summary: [Rust] Implement AsRef for Buffer
 Key: ARROW-5463
 URL: https://issues.apache.org/jira/browse/ARROW-5463
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4419) [Flight] Deal with body buffers in FlightData

2019-05-31 Thread David Li (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852946#comment-16852946
 ] 

David Li commented on ARROW-4419:
-

The Java/C++ compatibility issue is resolved, but we haven't tested with a 
generic gRPC client. We could consider this fixed for Flight and move the 
testing to a separate issue.

> [Flight] Deal with body buffers in FlightData
> -
>
> Key: ARROW-4419
> URL: https://issues.apache.org/jira/browse/ARROW-4419
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: David Li
>Priority: Minor
>  Labels: flight
> Fix For: 0.14.0
>
>
> The Java implementation will fail to decode a schema message if the message 
> also contains (empty) body buffers (see ArrowMessage.asSchema's precondition 
> checks). However, clients using default Protobuf serialization will likely 
> write an empty body buffer by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852944#comment-16852944
 ] 

Robin Kåveland commented on ARROW-5430:
---

Aha, it looks like maybe we can just this macro instead: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L53] 
– I'll check if that does the trick.

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the priority to minor here because it's easy enough to work around this 
> in user code unless you really need the 64 bit hash (and you probably 
> shouldn't be partitioning on that anyway).
> I could take a stab at writing a patch for this if there's interest?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852939#comment-16852939
 ] 

Robin Kåveland edited comment on ARROW-5430 at 5/31/19 11:51 AM:
-

Edit: Very sorry about the formatting here, I haven't used Jira for many years 
and apparently it shows, I can't manage to insert links correctly. :(

Okay, I must admit to being a bit stumped here. I followed the trail from 
{{_sequence_to_array}} to find out where the {{ArrowUnknown}} is coming from. 
And I'm quite sure it must be CIntFromPythonImpl: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179]

Here, we call some CPython APIs, namely PyLong_AsLong: 
[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]

and PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

It was easy enough to add a testcase in pure-python. I'm guessing the right 
"fix" would be something like adding a branch to {{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{?}}


was (Author: kaaveland):
Okay, I must admit to being a bit stumped here. I followed the trail from 
{{_sequence_to_array}} to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179]].
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{?}}

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in 

[jira] [Comment Edited] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852939#comment-16852939
 ] 

Robin Kåveland edited comment on ARROW-5430 at 5/31/19 11:48 AM:
-

Okay, I must admit to being a bit stumped here. I followed the trail from 
{{_sequence_to_array}} to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179].]
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{}}{{?}}


was (Author: kaaveland):
Okay, I must admit to being a bit stumped here. I followed the trail from 
`_sequence_to_array` to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179].]
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{}}{{?}}

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the 

[jira] [Comment Edited] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852939#comment-16852939
 ] 

Robin Kåveland edited comment on ARROW-5430 at 5/31/19 11:48 AM:
-

Okay, I must admit to being a bit stumped here. I followed the trail from 
{{_sequence_to_array}} to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179]].
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{?}}


was (Author: kaaveland):
Okay, I must admit to being a bit stumped here. I followed the trail from 
{{_sequence_to_array}} to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179].]
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{}}{{?}}

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the priority 

[jira] [Commented] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-05-31 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852939#comment-16852939
 ] 

Robin Kåveland commented on ARROW-5430:
---

Okay, I must admit to being a bit stumped here. I followed the trail from 
`_sequence_to_array` to find out where the `ArrowUnknown` is coming from. And 
I'm quite sure it must be 
[CIntFromPythonImpl|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L179].]
 Here, we call some CPython APIs, namely 
[PyLong_AsLong|[https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong]] and 
PyLong_AsLongLong, both of which return {{-1}} for overflows. Then, we call 
{{RETURN_IF_PYERROR}} in the case where we get {{-1}}. 
[This|https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L36-L51]
 block of code looks like it could be the right place to make the change. But 
now I'm very much on thin ice as I don't know much C++ at all and I'm also not 
very familiar with the CPython C API.

I'm guessing the right "fix" would be something like adding a branch to 
{{ConvertPyError}} that checks 
{{PyErr_ExceptionMatches(}}{{PyExc_OverflowError}}{{)}}{{}}{{?}}

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the priority to minor here because it's easy enough to work around this 
> in user code unless you really need the 64 bit hash (and you probably 
> shouldn't be partitioning on that anyway).
> I could take a stab at writing a patch for this if there's interest?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5462) [Go] support writing zero-length List

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5462.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4433
[https://github.com/apache/arrow/pull/4433]

> [Go] support writing zero-length List
> -
>
> Key: ARROW-5462
> URL: https://issues.apache.org/jira/browse/ARROW-5462
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5462) [Go] support writing zero-length List

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5462:
--
Labels: pull-request-available  (was: )

> [Go] support writing zero-length List
> -
>
> Key: ARROW-5462
> URL: https://issues.apache.org/jira/browse/ARROW-5462
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5459) [Go] implement Stringer for Float16 DataType

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5459.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4431
[https://github.com/apache/arrow/pull/4431]

> [Go] implement Stringer for Float16 DataType
> 
>
> Key: ARROW-5459
> URL: https://issues.apache.org/jira/browse/ARROW-5459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1837:
--
Labels: columnar-format-1.0 pull-request-available  (was: 
columnar-format-1.0)

> [Java] Unable to read unsigned integers outside signed range for bit width in 
> integration tests
> ---
>
> Key: ARROW-1837
> URL: https://issues.apache.org/jira/browse/ARROW-1837
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 0.14.0
>
> Attachments: generated_primitive.json
>
>
> I believe this was introduced recently (perhaps in the refactors), but there 
> was a problem where the integration tests weren't being properly run that hid 
> the error from us
> see https://github.com/apache/arrow/pull/1294#issuecomment-345553066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5459) [Go] implement Stringer for Float16 DataType

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5459:
--
Labels: pull-request-available  (was: )

> [Go] implement Stringer for Float16 DataType
> 
>
> Key: ARROW-5459
> URL: https://issues.apache.org/jira/browse/ARROW-5459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5459) [Go] implement Stringer for Float16 DataType

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet updated ARROW-5459:
---
Summary: [Go] implement Stringer for Float16 DataType  (was: [Go] implement 
Stringer for Float16 (array+dtype))

> [Go] implement Stringer for Float16 DataType
> 
>
> Key: ARROW-5459
> URL: https://issues.apache.org/jira/browse/ARROW-5459
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Assignee: Sebastien Binet
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5460) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan closed ARROW-5460.
---
Resolution: Duplicate

> [Java] Add micro-benchmarks for Float8Vector and allocators
> ---
>
> Key: ARROW-5460
> URL: https://issues.apache.org/jira/browse/ARROW-5460
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>
> For the past days, we have been involved in some performance related issues. 
> In this process, we have created some performance benchmarks, to help us 
> verify performance results.
> Now we want to add such micro-benchmarks to the code base, in the hope that 
> they will be helpful for making performance-related decisions and avoid 
> performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5461) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5461:
--
Labels: pull-request-available  (was: )

> [Java] Add micro-benchmarks for Float8Vector and allocators
> ---
>
> Key: ARROW-5461
> URL: https://issues.apache.org/jira/browse/ARROW-5461
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
>
> For the past days, we have been involved in some performance related issues. 
> In this process, we have created some performance benchmarks, to help us 
> verify performance results.
> Now we want to add such micro-benchmarks to the code base, in the hope that 
> they will be helpful for making performance-related decisions and avoid 
> performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5461) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5461:
---

 Summary: [Java] Add micro-benchmarks for Float8Vector and 
allocators
 Key: ARROW-5461
 URL: https://issues.apache.org/jira/browse/ARROW-5461
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For the past days, we have been involved in some performance related issues. In 
this process, we have created some performance benchmarks, to help us verify 
performance results.

Now we want to add such micro-benchmarks to the code base, in the hope that 
they will be helpful for making performance-related decisions and avoid 
performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5460) [Java] Add micro-benchmarks for Float8Vector and allocators

2019-05-31 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5460:
---

 Summary: [Java] Add micro-benchmarks for Float8Vector and 
allocators
 Key: ARROW-5460
 URL: https://issues.apache.org/jira/browse/ARROW-5460
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For the past days, we have been involved in some performance related issues. In 
this process, we have created some performance benchmarks, to help us verify 
performance results.

Now we want to add such micro-benchmarks to the code base, in the hope that 
they will be helpful for making performance-related decisions and avoid 
performance degradation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5459) [Go] implement Stringer for Float16 (array+dtype)

2019-05-31 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5459:
--

 Summary: [Go] implement Stringer for Float16 (array+dtype)
 Key: ARROW-5459
 URL: https://issues.apache.org/jira/browse/ARROW-5459
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go
Reporter: Sebastien Binet
Assignee: Sebastien Binet






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3680) [Go] implement Float16 array

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-3680.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4083
[https://github.com/apache/arrow/pull/4083]

> [Go] implement Float16 array
> 
>
> Key: ARROW-3680
> URL: https://issues.apache.org/jira/browse/ARROW-3680
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5384) [Go] add FixedSizeList array

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5384.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4357
[https://github.com/apache/arrow/pull/4357]

> [Go] add FixedSizeList array
> 
>
> Key: ARROW-5384
> URL: https://issues.apache.org/jira/browse/ARROW-5384
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5387) [Go] properly handle sub-slice of List

2019-05-31 Thread Sebastien Binet (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastien Binet resolved ARROW-5387.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4360
[https://github.com/apache/arrow/pull/4360]

> [Go] properly handle sub-slice of List
> --
>
> Key: ARROW-5387
> URL: https://issues.apache.org/jira/browse/ARROW-5387
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Go
>Reporter: Sebastien Binet
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> consider an `array.List` with the following content:
> `[[0 1 2] (null) [3 4 5 6]]`
>  
> sub-slicing it with `array.NewSlice(arr, 1, 3)`, we get:
> `[(null) []]` instead of `[(null) [3 4 5 6]]`
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5457) [GLib][Plasma] Environment variable name for test is wrong

2019-05-31 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-5457.

   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4426
[https://github.com/apache/arrow/pull/4426]

> [GLib][Plasma] Environment variable name for test is wrong
> --
>
> Key: ARROW-5457
> URL: https://issues.apache.org/jira/browse/ARROW-5457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.13.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5448) [CI] MinGW build failures on AppVeyor

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-5448.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4428
[https://github.com/apache/arrow/pull/4428]

> [CI] MinGW build failures on AppVeyor
> -
>
> Key: ARROW-5448
> URL: https://issues.apache.org/jira/browse/ARROW-5448
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Apparently the Numpy package is broken. See 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24922425/job/9yoq08uepk5p6dwb
> {code}
> -- Found PythonLibs: C:/msys64/mingw32/lib/libpython3.7m.dll.a
> CMake Error at cmake_modules/FindNumPy.cmake:62 (message):
>   NumPy import failure:
>   Traceback (most recent call last):
> File 
> "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\__init__.py", line 
> 40, in 
>   from . import multiarray
> File 
> "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\multiarray.py", 
> line 12, in 
>   from . import overrides
> File 
> "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\overrides.py", line 
> 6, in 
>   from numpy.core._multiarray_umath import (
>   ImportError: DLL load failed: The specified module could not be found.
>   
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly

2019-05-31 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-5455.
-
Resolution: Fixed

Issue resolved by pull request 4429
[https://github.com/apache/arrow/pull/4429]

> [Rust] Build broken by 2019-05-30 Rust nightly
> --
>
> Key: ARROW-5455
> URL: https://issues.apache.org/jira/browse/ARROW-5455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seem example failed build
> https://travis-ci.org/apache/arrow/jobs/539477452



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule

2019-05-31 Thread Praveen Kumar Desabandu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852709#comment-16852709
 ] 

Praveen Kumar Desabandu commented on ARROW-4301:


[~wesmckinn] sorry for delay in getting back on this, i will look into this in 
the weekend.

> [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva 
> submodule
> ---
>
> Key: ARROW-4301
> URL: https://issues.apache.org/jira/browse/ARROW-4301
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva, Java
>Reporter: Wes McKinney
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See 
> https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550.
>  This is breaking the build so I'm going to patch manually



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly

2019-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5455:
--
Labels: pull-request-available  (was: )

> [Rust] Build broken by 2019-05-30 Rust nightly
> --
>
> Key: ARROW-5455
> URL: https://issues.apache.org/jira/browse/ARROW-5455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Seem example failed build
> https://travis-ci.org/apache/arrow/jobs/539477452



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly

2019-05-31 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned ARROW-5455:
---

Assignee: Chao Sun

> [Rust] Build broken by 2019-05-30 Rust nightly
> --
>
> Key: ARROW-5455
> URL: https://issues.apache.org/jira/browse/ARROW-5455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Wes McKinney
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seem example failed build
> https://travis-ci.org/apache/arrow/jobs/539477452



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2