[jira] [Resolved] (ARROW-2012) [GLib] Support "make distclean"

2018-01-21 Thread Kouhei Sutou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-2012.
-
Resolution: Fixed

Issue resolved by pull request 1494
[https://github.com/apache/arrow/pull/1494]

> [GLib] Support "make distclean"
> ---
>
> Key: ARROW-2012
> URL: https://issues.apache.org/jira/browse/ARROW-2012
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-41) C++: Convert RecordBatch to StructArray, and back

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-41:
--
Summary: C++: Convert RecordBatch to StructArray, and back  (was: C++: 
Convert table to std::vector of Struct arrays)

> C++: Convert RecordBatch to StructArray, and back
> -
>
> Key: ARROW-41
> URL: https://issues.apache.org/jira/browse/ARROW-41
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> This may require memory allocation depending on the chunking of the table 
> columns. 
> While tables and struct type columns are semantically equivalent (and tables 
> can be embedded in other tables using struct types), the memory layout of a 
> table may not be strictly contiguous. For the purposes of putting data on the 
> wire / in shared memory, it may be useful to offer a conversion function to 
> "structify" an in-memory logical Arrow table. See ARROW-24



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-41) C++: Convert RecordBatch to StructArray, and back

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-41:
--
Description: With {{arrow::TableBatchReader}}, we can turn a Table into a 
sequence of one or more RecordBatches. It would be useful to be able to easily 
convert between RecordBatch and a StructArray (which can be semantically 
equivalent in some contexts)  (was: This may require memory allocation 
depending on the chunking of the table columns. 

While tables and struct type columns are semantically equivalent (and tables 
can be embedded in other tables using struct types), the memory layout of a 
table may not be strictly contiguous. For the purposes of putting data on the 
wire / in shared memory, it may be useful to offer a conversion function to 
"structify" an in-memory logical Arrow table. See ARROW-24)

> C++: Convert RecordBatch to StructArray, and back
> -
>
> Key: ARROW-41
> URL: https://issues.apache.org/jira/browse/ARROW-41
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> With {{arrow::TableBatchReader}}, we can turn a Table into a sequence of one 
> or more RecordBatches. It would be useful to be able to easily convert 
> between RecordBatch and a StructArray (which can be semantically equivalent 
> in some contexts)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-388) [C++] Add a "shifted file" abstraction

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-388.
--
Resolution: Won't Fix

This is no longer needed

> [C++] Add a "shifted file" abstraction
> --
>
> Key: ARROW-388
> URL: https://issues.apache.org/jira/browse/ARROW-388
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> In an IPC setting, you may wish to interpret a file-like object by some other 
> frame of reference. For example, you may want to consider the middle of a 
> file to be the initial 0 position -- this would all be done with zero copy 
> and no "work" performed, strictly passing through method calls to the 
> underlying file interface. Related to ARROW-387, ARROW-384



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-530) C++/Python: Provide subpools for better memory allocation tracking

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-530:
---
Fix Version/s: (was: 0.9.0)

> C++/Python: Provide subpools for better memory allocation tracking
> --
>
> Key: ARROW-530
> URL: https://issues.apache.org/jira/browse/ARROW-530
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: beginner, newbie
>
> Currently we can only track the amount of bytes allocated by the main memory 
> pool or the alternative jemalloc implementation. To better understand certain 
> situation, we should provide a MemoryPool proxy implementation that tracks 
> only the amount of memory that was made through its direct calls but 
> delegates the actual allocation to an underlying pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-567) [C++] File and stream APIs for interacting with "large" schemas

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-567:
---
Fix Version/s: (was: 0.9.0)

> [C++] File and stream APIs for interacting with "large" schemas
> ---
>
> Key: ARROW-567
> URL: https://issues.apache.org/jira/browse/ARROW-567
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> For data where the metadata itself is large (> 1 fields), doing a full 
> in-memory reconstruction of a record batch may be impractical if the user's 
> goal is to do random access on a potentially small subset of a batch. 
> I propose adding an API that enables "cheap" inspection of the record batch 
> metadata and reconstruction of fields. 
> Because of the flattened buffer and field metadata, at the moment the 
> complexity of random field access will scale with the number of fields -- in 
> the future we may devise strategies to mitigate this (e.g. storing a 
> pre-computed buffer/field lookup table in the schema)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-554) [C++] Implement functions to conform unequal dictionaries amongst multiple Arrow arrays

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-554:
---
Fix Version/s: (was: 0.9.0)
   0.10.0

> [C++] Implement functions to conform unequal dictionaries amongst multiple 
> Arrow arrays
> ---
>
> Key: ARROW-554
> URL: https://issues.apache.org/jira/browse/ARROW-554
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics
> Fix For: 0.10.0
>
>
> We may wish to either
> * Conform the dictionary indices to reference a common dictionary
> * Concatenate indices into a new array with a common dictionary
> This is related to in-memory dictionary encoding, as you start with a 
> partially-built dictionary and then add entries as you observe new ones in 
> other dictionaries, all the while "rebasing" indices to consistently 
> reference the same dictionary at the end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-911) [Python] Expand development.rst with build instructions without conda

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-911:
---
Fix Version/s: 0.9.0

> [Python] Expand development.rst with build instructions without conda
> -
>
> Key: ARROW-911
> URL: https://issues.apache.org/jira/browse/ARROW-911
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> There should be sufficient detail to install on at least OS X and Linux using 
> the built-in thirdparty build toolchain
> https://github.com/wesm/arrow/blob/ee5cb2ad171f0f4c7673f2937dc226d62aad972c/python/README.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-906) [C++] Serialize Field metadata to IPC metadata

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-906:
--

Assignee: Wes McKinney

> [C++] Serialize Field metadata to IPC metadata
> --
>
> Key: ARROW-906
> URL: https://issues.apache.org/jira/browse/ARROW-906
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> Follow up work to ARROW-898



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-911) [Python] Expand development.rst with build instructions without conda

2018-01-21 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333683#comment-16333683
 ] 

Wes McKinney commented on ARROW-911:


Is there anything we need to improve here?

> [Python] Expand development.rst with build instructions without conda
> -
>
> Key: ARROW-911
> URL: https://issues.apache.org/jira/browse/ARROW-911
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> There should be sufficient detail to install on at least OS X and Linux using 
> the built-in thirdparty build toolchain
> https://github.com/wesm/arrow/blob/ee5cb2ad171f0f4c7673f2937dc226d62aad972c/python/README.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1042) [Python] C++ API plumbing for returning generic instance of ipc::RecordBatchReader to user

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1042:

Fix Version/s: (was: 0.9.0)

> [Python] C++ API plumbing for returning generic instance of 
> ipc::RecordBatchReader to user
> --
>
> Key: ARROW-1042
> URL: https://issues.apache.org/jira/browse/ARROW-1042
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>
> Currently we have no mechanism of wrapping a 
> {{std::shared_ptr}} like we do with some other 
> Arrow types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1012) [C++] Create implementation of StreamReader that reads from Apache Parquet files

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1012:

Fix Version/s: (was: 0.9.0)

> [C++] Create implementation of StreamReader that reads from Apache Parquet 
> files
> 
>
> Key: ARROW-1012
> URL: https://issues.apache.org/jira/browse/ARROW-1012
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> This will be enabled by ARROW-1008



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1009) [C++] Create asynchronous version of StreamReader

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1009:

Fix Version/s: (was: 0.9.0)

> [C++] Create asynchronous version of StreamReader
> -
>
> Key: ARROW-1009
> URL: https://issues.apache.org/jira/browse/ARROW-1009
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> the {{AsyncStreamReader}} would buffer the next record batch in a background 
> thread, while emulating the current synchronous / blocking API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1207) [C++] Implement Map logical type

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1207:

Fix Version/s: (was: 0.9.0)
   1.0.0

> [C++] Implement Map logical type
> 
>
> Key: ARROW-1207
> URL: https://issues.apache.org/jira/browse/ARROW-1207
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> A map is implemented as a list of structs with fields key and value. We 
> should separately discuss whether this merits an addition to the Arrow 
> metadata 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1280) [C++] Implement Fixed Size List type

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1280:

Fix Version/s: (was: 0.9.0)
   1.0.0

> [C++] Implement Fixed Size List type
> 
>
> Key: ARROW-1280
> URL: https://issues.apache.org/jira/browse/ARROW-1280
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1279) Integration tests for Map type

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1279:

Fix Version/s: 1.0.0

> Integration tests for Map type
> --
>
> Key: ARROW-1279
> URL: https://issues.apache.org/jira/browse/ARROW-1279
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wes McKinney
>Assignee: Siddharth Teotia
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1280) [C++] Implement Fixed Size List type

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1280:
---

Assignee: Wes McKinney

> [C++] Implement Fixed Size List type
> 
>
> Key: ARROW-1280
> URL: https://issues.apache.org/jira/browse/ARROW-1280
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1391) [Python] Benchmarks for python serialization

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1391:

Fix Version/s: (was: 0.9.0)

> [Python] Benchmarks for python serialization
> 
>
> Key: ARROW-1391
> URL: https://issues.apache.org/jira/browse/ARROW-1391
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Philipp Moritz
>Priority: Minor
>
> It would be great to have a suite of relevant benchmarks for the Python 
> serialization code in ARROW-759. These could be used to guide profiling and 
> performance improvements.
> Relevant use cases include:
> - dictionaries of large numpy arrays that are used to represent weights of a 
> neural network
> - long lists of primitive types like ints, floats or strings
> - lists of user defined python objects



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2016) [Python] Fix up ASV benchmarking setup and document procedure for use

2018-01-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2016:
---

 Summary: [Python] Fix up ASV benchmarking setup and document 
procedure for use
 Key: ARROW-2016
 URL: https://issues.apache.org/jira/browse/ARROW-2016
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.9.0


We need to start writing more microbenchmarks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-911) [Python] Expand development.rst with build instructions without conda

2018-01-21 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-911.
---
   Resolution: Fixed
 Assignee: Uwe L. Korn
Fix Version/s: (was: 0.9.0)
   0.8.0

> [Python] Expand development.rst with build instructions without conda
> -
>
> Key: ARROW-911
> URL: https://issues.apache.org/jira/browse/ARROW-911
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.8.0
>
>
> There should be sufficient detail to install on at least OS X and Linux using 
> the built-in thirdparty build toolchain
> https://github.com/wesm/arrow/blob/ee5cb2ad171f0f4c7673f2937dc226d62aad972c/python/README.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1424) [Python] Initial bindings for libarrow_gpu

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1424:

Fix Version/s: (was: 0.9.0)

> [Python] Initial bindings for libarrow_gpu
> --
>
> Key: ARROW-1424
> URL: https://issues.apache.org/jira/browse/ARROW-1424
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GPU, Python
>Reporter: Wes McKinney
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-911) [Python] Expand development.rst with build instructions without conda

2018-01-21 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333684#comment-16333684
 ] 

Uwe L. Korn commented on ARROW-911:
---

The Python documentation is sufficient to build using {{pip}}, in the case 
where no {{arrow_python}} is being built, it also suffices to show the needed 
steps for a conda-free setup.

> [Python] Expand development.rst with build instructions without conda
> -
>
> Key: ARROW-911
> URL: https://issues.apache.org/jira/browse/ARROW-911
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.8.0
>
>
> There should be sufficient detail to install on at least OS X and Linux using 
> the built-in thirdparty build toolchain
> https://github.com/wesm/arrow/blob/ee5cb2ad171f0f4c7673f2937dc226d62aad972c/python/README.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1465) [C++] Accommodate ABI changes in libhdfs3 from 2.2.31 to 2.3 on conda-forge

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1465:

Fix Version/s: (was: 0.9.0)

> [C++] Accommodate ABI changes in libhdfs3 from 2.2.31 to 2.3 on conda-forge
> ---
>
> Key: ARROW-1465
> URL: https://issues.apache.org/jira/browse/ARROW-1465
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1423) [C++] Create non-owned CudaContext from context handle provided by thirdparty user

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1423:

Fix Version/s: (was: 0.9.0)

> [C++] Create non-owned CudaContext from context handle provided by thirdparty 
> user
> --
>
> Key: ARROW-1423
> URL: https://issues.apache.org/jira/browse/ARROW-1423
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GPU
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>
> Follow-up work to ARROW-1364. This will enable Arrow to allocate device 
> memory within an existing driver context rather than having to create its own 
> separate context



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1485) [C++] Implement union-like data type for accommodating kernel arguments which may be scalars or arrays

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1485.
-
   Resolution: Fixed
 Assignee: Wes McKinney
Fix Version/s: 0.8.0

This was initially done in 
https://github.com/apache/arrow/commit/f2806fa518583907a129b2ecb0b7ec8758b69e17

> [C++] Implement union-like data type for accommodating kernel arguments which 
> may be scalars or arrays
> --
>
> Key: ARROW-1485
> URL: https://issues.apache.org/jira/browse/ARROW-1485
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.8.0
>
>
> For example, either argument to the binary operator {{Add}} may be scalar or 
> array. Some systems have addressed this issue by using arrays of length 1, 
> but I would prefer that we accommodate this in Arrow's C++ data types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1596) [Python] Expand serialization test suite for NumPy arrays

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1596:

Fix Version/s: (was: 0.9.0)

> [Python] Expand serialization test suite for NumPy arrays
> -
>
> Key: ARROW-1596
> URL: https://issues.apache.org/jira/browse/ARROW-1596
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>
> see 
> https://github.com/dask/distributed/blob/master/distributed/protocol/tests/test_numpy.py#L30-L65
>  for inspiration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1660) [Python] pandas field values are messed up across rows

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1660.
-
Resolution: Cannot Reproduce

If you can post some Parquet files that reproduces the issue, please reopen the 
issue. Thank you

> [Python] pandas field values are messed up across rows
> --
>
> Key: ARROW-1660
> URL: https://issues.apache.org/jira/browse/ARROW-1660
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: 4.4.0-72-generic #93-Ubuntu SMP x86_64, python3
>Reporter: MIkhail Osckin
>Assignee: Wes McKinney
>Priority: Major
>
> I have the following scala case class to store sparse matrix data to read it 
> later using python
> {code:java}
> case class CooVector(
> id: Int,
> row_ids: Seq[Int],
> rowsIdx: Seq[Int],
> colIdx: Seq[Int],
> data: Seq[Double])
> {code}
> I save the dataset of this type to multiple parquet files using spark and 
> then read it using pyarrow.parquet and convert the result to pandas dataset.
> The problem i have is that some values end up in wrong rows, for example, 
> row_ids might end up in wrong cooVector row. I have no idea what the reason 
> is but might be it is related to the fact that the fields are of variable 
> sizes. And everything is correct if i read it using spark. Also i checked 
> to_pydict method and the result is correct, so seems like the problem 
> somewhere in to_pandas method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2016) [Python] Fix up ASV benchmarking setup and document procedure for use

2018-01-21 Thread Robert Nishihara (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333689#comment-16333689
 ] 

Robert Nishihara commented on ARROW-2016:
-

Are all of the benchmarks in 
[https://github.com/apache/arrow/tree/f72279b2dbfc663d2217e64075dd731199f12611/python/benchmarks?|https://github.com/apache/arrow/tree/f72279b2dbfc663d2217e64075dd731199f12611/python/benchmarks]
 Any others?

> [Python] Fix up ASV benchmarking setup and document procedure for use
> -
>
> Key: ARROW-2016
> URL: https://issues.apache.org/jira/browse/ARROW-2016
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> We need to start writing more microbenchmarks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1789) [Format] Consolidate specification documents and improve clarity for new implementation authors

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1789:

Fix Version/s: 0.9.0

> [Format] Consolidate specification documents and improve clarity for new 
> implementation authors
> ---
>
> Key: ARROW-1789
> URL: https://issues.apache.org/jira/browse/ARROW-1789
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> See discussion in https://github.com/apache/arrow/issues/1296
> I believe the specification documents Layout.md, Metadata.md, and IPC.md 
> would benefit from being consolidated into a single Markdown document that 
> would be sufficient (along with the Flatbuffers schemas) to create a complete 
> Arrow implementation capable of reading and writing the binary format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2016) [Python] Fix up ASV benchmarking setup and document procedure for use

2018-01-21 Thread Robert Nishihara (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333689#comment-16333689
 ] 

Robert Nishihara edited comment on ARROW-2016 at 1/21/18 9:24 PM:
--

Are all of the benchmarks in 
[https://github.com/apache/arrow/tree/master/python/benchmarks] or are there 
any others?


was (Author: robertnishihara):
Are all of the benchmarks in 
[https://github.com/apache/arrow/tree/f72279b2dbfc663d2217e64075dd731199f12611/python/benchmarks?|https://github.com/apache/arrow/tree/f72279b2dbfc663d2217e64075dd731199f12611/python/benchmarks]
 Any others?

> [Python] Fix up ASV benchmarking setup and document procedure for use
> -
>
> Key: ARROW-2016
> URL: https://issues.apache.org/jira/browse/ARROW-2016
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> We need to start writing more microbenchmarks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1774:

Fix Version/s: (was: 0.9.0)

> [C++] Add "view" function to create zero-copy views for compatible types, if 
> supported
> --
>
> Key: ARROW-1774
> URL: https://issues.apache.org/jira/browse/ARROW-1774
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> Similar to NumPy's {{ndarray.view}}, but with the restriction that the input 
> and output types have the same physical Arrow memory layout. This might be as 
> simple as adding a "zero copy only" option to the existing {{Cast}} kernel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1824) [C++] Add better GPU support for RecordBatch objects in arrow::ipc::*

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1824:

Summary: [C++] Add better GPU support for RecordBatch objects in 
arrow::ipc::*  (was: [C++] Add RecordBatch support to arrow::ipc::*)

> [C++] Add better GPU support for RecordBatch objects in arrow::ipc::*
> -
>
> Key: ARROW-1824
> URL: https://issues.apache.org/jira/browse/ARROW-1824
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GPU
>Affects Versions: 0.7.1
>Reporter: Kouhei Sutou
>Priority: Major
>
> The current arrow::ipc::* such as RecordBatchStreamReader and 
> RecordBatchStreamWriter don't work for RecordBatch on GPU. It's useful that 
> they can process RecordBatch on GPU same as it on CPU with the same API.
> ARROW-1808 may be required for implementing this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1997) [Python] to_pandas with strings_to_categorical fails

2018-01-21 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1997:

Fix Version/s: 0.9.0

> [Python] to_pandas with strings_to_categorical fails
> 
>
> Key: ARROW-1997
> URL: https://issues.apache.org/jira/browse/ARROW-1997
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Licht Takeuchi
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Repro code.
> Seems that unexpected deallocation occured.
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({
> 'Foo': ['A', 'A', 'B', 'B']
> })
> table = pa.Table.from_pandas(df)
> df = table.to_pandas(strings_to_categorical=True)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)