[jira] [Created] (ARROW-1055) [C++] Create add-on library for CUDA / GPU integration
Wes McKinney created ARROW-1055: --- Summary: [C++] Create add-on library for CUDA / GPU integration Key: ARROW-1055 URL: https://issues.apache.org/jira/browse/ARROW-1055 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney The initial will be to enable IPC record batch loading and other core operations on the GPU. We can attach JIRAs to this parent issue -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-1036) [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)
[ https://issues.apache.org/jira/browse/ARROW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017795#comment-16017795 ] Wes McKinney commented on ARROW-1036: - Sure, whenever you're ready, feel free to go for it (you can make a PR into the arrow/site directory) > [C++] Define abstract API for filtering Arrow streams (e.g. predicate > evaluation) > - > > Key: ARROW-1036 > URL: https://issues.apache.org/jira/browse/ARROW-1036 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney > > It would be useful to be able to apply analytic predicates to an Arrow stream > in a composable way. As soon as we are able to compute some simple predicates > on in-memory Arrow data, we could define our first version of this -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2
[ https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017411#comment-16017411 ] Wes McKinney commented on ARROW-1054: - PR: https://github.com/apache/arrow/pull/705 > [Python] Test suite fails on pandas 0.19.2 > -- > > Key: ARROW-1054 > URL: https://issues.apache.org/jira/browse/ARROW-1054 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.4.0 >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > > The test test_pandas_serialize_round_trip_multi_index fails. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2
[ https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1054: Description: The test test_pandas_serialize_round_trip_multi_index fails. (was: The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker for 0.4.0, but we should fix. ) > [Python] Test suite fails on pandas 0.19.2 > -- > > Key: ARROW-1054 > URL: https://issues.apache.org/jira/browse/ARROW-1054 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.4.0 >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > > The test test_pandas_serialize_round_trip_multi_index fails. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2
[ https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1054: --- Assignee: Wes McKinney > [Python] Test suite fails on pandas 0.19.2 > -- > > Key: ARROW-1054 > URL: https://issues.apache.org/jira/browse/ARROW-1054 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.4.0 >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > > The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker > for 0.4.0, but we should fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2
[ https://issues.apache.org/jira/browse/ARROW-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1054: Fix Version/s: 0.4.0 > [Python] Test suite fails on pandas 0.19.2 > -- > > Key: ARROW-1054 > URL: https://issues.apache.org/jira/browse/ARROW-1054 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.4.0 >Reporter: Wes McKinney > Fix For: 0.4.0 > > > The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker > for 0.4.0, but we should fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-1054) [Python] Test suite fails on pandas 0.19.2
Wes McKinney created ARROW-1054: --- Summary: [Python] Test suite fails on pandas 0.19.2 Key: ARROW-1054 URL: https://issues.apache.org/jira/browse/ARROW-1054 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.4.0 Reporter: Wes McKinney The test test_pandas_serialize_round_trip_multi_index fails. Not a blocker for 0.4.0, but we should fix. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-1053) [Python] Memory leak with RecordBatchFileReader
[ https://issues.apache.org/jira/browse/ARROW-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1053. - Resolution: Fixed Issue resolved by pull request 704 [https://github.com/apache/arrow/pull/704] > [Python] Memory leak with RecordBatchFileReader > --- > > Key: ARROW-1053 > URL: https://issues.apache.org/jira/browse/ARROW-1053 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Bryan Cutler >Assignee: Wes McKinney > Fix For: 0.4.0 > > > While working on SPARK-13534 and running repeated calls to {{toPandas}}, > memory usage continues to climb and I isolated to the Python side. The > following code reproduces the issue, which looks like a memory leak. > Commenting out the block with the {{RecordBatchFileReader}} while leaving the > writer, memory usage is stable, so I believe the issue is with the reader. > {noformat} > import pyarrow as pa > import numpy as np > import memory_profiler > import gc > import io > def leak(): > data = [pa.array(np.concatenate([np.random.randn(10)] * 10))] > table = pa.Table.from_arrays(data, ['foo']) > while True: > print('calling to_pandas') > print('memory_usage: {0}'.format(memory_profiler.memory_usage())) > df = table.to_pandas() > batch = pa.RecordBatch.from_pandas(df) > sink = io.BytesIO() > writer = pa.RecordBatchFileWriter(sink, batch.schema) > writer.write_batch(batch) > writer.close() > reader = pa.open_file(pa.BufferReader(sink.getvalue())) > reader.read_all() > gc.collect() > leak() > {noformat} > Some of the output from the code above: > {noformat} > calling to_pandas > memory_usage: [67.0546875] > calling to_pandas > memory_usage: [143.95703125] > calling to_pandas > memory_usage: [151.58984375] > calling to_pandas > memory_usage: [174.453125] > calling to_pandas > memory_usage: [189.84765625] > calling to_pandas > memory_usage: [212.7109375] > calling to_pandas > memory_usage: [228.046875] > calling to_pandas > memory_usage: [243.109375] > calling to_pandas > memory_usage: [258.4375] > calling to_pandas > memory_usage: [273.83203125] > calling to_pandas > memory_usage: [288.90234375] > calling to_pandas > memory_usage: [304.23046875] > calling to_pandas > memory_usage: [319.625] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)