[jira] [Created] (ARROW-1055) [C++] Create add-on library for CUDA / GPU integration

2017-05-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1055:
---

 Summary: [C++] Create add-on library for CUDA / GPU integration
 Key: ARROW-1055
 URL: https://issues.apache.org/jira/browse/ARROW-1055
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


The initial will be to enable IPC record batch loading and other core 
operations on the GPU. We can attach JIRAs to this parent issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-1036) [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)

2017-05-19 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017795#comment-16017795
 ] 

Wes McKinney commented on ARROW-1036:
-

Sure, whenever you're ready, feel free to go for it (you can make a PR into the 
arrow/site directory)

> [C++] Define abstract API for filtering Arrow streams (e.g. predicate 
> evaluation)
> -
>
> Key: ARROW-1036
> URL: https://issues.apache.org/jira/browse/ARROW-1036
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>
> It would be useful to be able to apply analytic predicates to an Arrow stream 
> in a composable way. As soon as we are able to compute some simple predicates 
> on in-memory Arrow data, we could define our first version of this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2

2017-05-19 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017411#comment-16017411
 ] 

Wes McKinney commented on ARROW-1054:
-

PR: https://github.com/apache/arrow/pull/705

> [Python] Test suite fails on pandas 0.19.2
> --
>
> Key: ARROW-1054
> URL: https://issues.apache.org/jira/browse/ARROW-1054
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.4.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> The test test_pandas_serialize_round_trip_multi_index fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2

2017-05-19 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1054:

Description: The test test_pandas_serialize_round_trip_multi_index fails.   
(was: The test test_pandas_serialize_round_trip_multi_index fails. Not a 
blocker for 0.4.0, but we should fix. )

> [Python] Test suite fails on pandas 0.19.2
> --
>
> Key: ARROW-1054
> URL: https://issues.apache.org/jira/browse/ARROW-1054
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.4.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> The test test_pandas_serialize_round_trip_multi_index fails. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2

2017-05-19 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1054:
---

Assignee: Wes McKinney

> [Python] Test suite fails on pandas 0.19.2
> --
>
> Key: ARROW-1054
> URL: https://issues.apache.org/jira/browse/ARROW-1054
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.4.0
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker 
> for 0.4.0, but we should fix. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2

2017-05-19 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1054:

Fix Version/s: 0.4.0

> [Python] Test suite fails on pandas 0.19.2
> --
>
> Key: ARROW-1054
> URL: https://issues.apache.org/jira/browse/ARROW-1054
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.4.0
>Reporter: Wes McKinney
> Fix For: 0.4.0
>
>
> The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker 
> for 0.4.0, but we should fix. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2

2017-05-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1054:
---

 Summary: [Python] Test suite fails on pandas 0.19.2
 Key: ARROW-1054
 URL: https://issues.apache.org/jira/browse/ARROW-1054
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.4.0
Reporter: Wes McKinney


The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker for 
0.4.0, but we should fix. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-1053) [Python] Memory leak with RecordBatchFileReader

2017-05-19 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1053.
-
Resolution: Fixed

Issue resolved by pull request 704
[https://github.com/apache/arrow/pull/704]

> [Python] Memory leak with RecordBatchFileReader
> ---
>
> Key: ARROW-1053
> URL: https://issues.apache.org/jira/browse/ARROW-1053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Bryan Cutler
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> While working on SPARK-13534 and running repeated calls to {{toPandas}}, 
> memory usage continues to climb and I isolated to the Python side.  The 
> following code reproduces the issue, which looks like a memory leak.  
> Commenting out the block with the {{RecordBatchFileReader}} while leaving the 
> writer, memory usage is stable, so I believe the issue is with the reader.
> {noformat}
> import pyarrow as pa
> import numpy as np
> import memory_profiler
> import gc
> import io
> def leak():
> data = [pa.array(np.concatenate([np.random.randn(10)] * 10))]
> table = pa.Table.from_arrays(data, ['foo'])
> while True:
> print('calling to_pandas')
> print('memory_usage: {0}'.format(memory_profiler.memory_usage()))
> df = table.to_pandas()
> batch = pa.RecordBatch.from_pandas(df)
> sink = io.BytesIO()
> writer = pa.RecordBatchFileWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> reader = pa.open_file(pa.BufferReader(sink.getvalue()))
> reader.read_all()
> gc.collect()
> leak()
> {noformat}
> Some of the output from the code above:
> {noformat}
> calling to_pandas
> memory_usage: [67.0546875]
> calling to_pandas
> memory_usage: [143.95703125]
> calling to_pandas
> memory_usage: [151.58984375]
> calling to_pandas
> memory_usage: [174.453125]
> calling to_pandas
> memory_usage: [189.84765625]
> calling to_pandas
> memory_usage: [212.7109375]
> calling to_pandas
> memory_usage: [228.046875]
> calling to_pandas
> memory_usage: [243.109375]
> calling to_pandas
> memory_usage: [258.4375]
> calling to_pandas
> memory_usage: [273.83203125]
> calling to_pandas
> memory_usage: [288.90234375]
> calling to_pandas
> memory_usage: [304.23046875]
> calling to_pandas
> memory_usage: [319.625]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)