[jira] [Created] (ARROW-17719) [Python] Improve error message when all values in a column are null in a parquet partition

2022-09-13 Thread Philipp Moritz (Jira)
Philipp Moritz created ARROW-17719:
--

 Summary: [Python] Improve error message when all values in a 
column are null in a parquet partition
 Key: ARROW-17719
 URL: https://issues.apache.org/jira/browse/ARROW-17719
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 9.0.0
Reporter: Philipp Moritz
 Fix For: 10.0.0


There is a good bug report about this in 
[https://stackoverflow.com/a/70568419/10891801] and it still seems to be a 
problem.

Basically the error message is pretty bad if all values in a given column of a 
parquet partition are null. We should either handle this case better or give a 
better error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17079) Improve error message propagation from AWS SDK

2022-07-15 Thread Philipp Moritz (Jira)
Philipp Moritz created ARROW-17079:
--

 Summary: Improve error message propagation from AWS SDK
 Key: ARROW-17079
 URL: https://issues.apache.org/jira/browse/ARROW-17079
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 8.0.0
Reporter: Philipp Moritz


Dear all,

I'd like to see if there is interest to improve the error messages that 
originate from the AWS SDK. Especially for loading datasets from S3, there are 
many things that can go wrong and the error messages that (Py)Arrow gives are 
not always the most actionable, especially if the call involves many different 
SDK functions. In particular, it would be great to have the following attached 
to each error message:
 * A machine parseable status code from the AWS SDK
 * Information as to exactly which AWS SDK call failed, so it can be 
disambiguated for Arrow API calls that use multiple AWS SDK calls

In the ideal case, as a developer I could reconstruct the AWS SDK call that 
failed from the error message (e.g. in a form the allows me to run the API call 
via the "aws" CLI program) so I can debug errors and see how they relate to my 
AWS infrastructure. Any progress in this direction would be super helpful.

 

For context: I recently was debugging some permissioning issues in S3 based on 
the current error codes and it was pretty hard to figure out what was going on 
(see [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).]

 

I'm happy to take a stab at this problem but might need some help. Is 
implementing a custom StatusDetail class for AWS errors and propagating errors 
that way the right hunch here? 
[https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110]

 

All the best,

Philipp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ARROW-7991) [C++][Plasma] Allow option for evicting if full when creating an object

2020-03-08 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-7991.
---
Fix Version/s: (was: 0.16.1)
   1.0.0
   Resolution: Fixed

Issue resolved by pull request 6520
[https://github.com/apache/arrow/pull/6520]

> [C++][Plasma] Allow option for evicting if full when creating an object
> ---
>
> Key: ARROW-7991
> URL: https://issues.apache.org/jira/browse/ARROW-7991
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.16.0
>Reporter: Stephanie Wang
>Assignee: Stephanie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Plasma currently attempts to evict objects whenever the client tries to 
> create an object and there is not enough space. Sometimes, though, it is 
> preferable to allow the client to try something else, such as skipping 
> creation or freeing other objects. This enhancement would allow the client to 
> pass in a flag during object creation specifying whether objects should be 
> evicted or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7998) [C++][Plasma] Make Seal requests synchronous

2020-03-05 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-7998.
---
Fix Version/s: (was: 0.16.1)
   1.0.0
   Resolution: Fixed

Issue resolved by pull request 6529
[https://github.com/apache/arrow/pull/6529]

> [C++][Plasma] Make Seal requests synchronous
> 
>
> Key: ARROW-7998
> URL: https://issues.apache.org/jira/browse/ARROW-7998
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.16.0
>Reporter: Stephanie Wang
>Assignee: Stephanie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When handling a `Seal` request to create an object and make it visible to 
> other clients, the plasma store does not wait until the seal is complete 
> before responding to the requesting client. This makes the interface hard to 
> use, since the client is not guaranteed that the object is visible yet and 
> would have to use an additional IPC round-trip to determine when the object 
> is ready.
>  
> This improvement would require the plasma store to wait until the object has 
> been created before responding to the client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7522) [C++][Plasma] Broken Record Batch returned from a function call

2020-01-11 Thread Philipp Moritz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013588#comment-17013588
 ] 

Philipp Moritz commented on ARROW-7522:
---

Disconnecting from the PlasmaClient and letting it go out of scope is fine, the 
memory mapped files will still be kept alive. The problem in this code example 
is that the buffer of the object is not kept alive (the buffer in the `auto 
buffer = object_buffer.data;` line). If that buffer is kept alive, this shared 
pointer here 
[https://github.com/apache/arrow/blob/b218a7fdae0792e185579d8cd20748ed0752b9ff/cpp/src/plasma/client.cc#L137]
 will make sure the PlasmaClient is kept alive, which will make sure the memory 
maps are kept alive.

To fix this, we would need some way to set a "base" object of an 
`arrow::RecordBatch` (similar to numpy base objects) which will make sure the 
backing buffer is kept alive. As a workaround you can also keep the 
PlasmaClient alive, but that feel very brittle.

> [C++][Plasma] Broken Record Batch returned from a function call
> ---
>
> Key: ARROW-7522
> URL: https://issues.apache.org/jira/browse/ARROW-7522
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma
>Affects Versions: 0.15.1
> Environment: macOS
>Reporter: Chengxin Ma
>Priority: Minor
>
> Scenario: retrieving Record Batch from Plasma with known Object ID.
> The following code snippet works well:
> {code:java}
> int main(int argc, char **argv)
> {
> plasma::ObjectID object_id = 
> plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");
> // Start up and connect a Plasma client.
> plasma::PlasmaClient client;
> ARROW_CHECK_OK(client.Connect("/tmp/store"));
> plasma::ObjectBuffer object_buffer;
> ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer));
> // Retrieve object data.
> auto buffer = object_buffer.data;
> arrow::io::BufferReader buffer_reader(buffer); 
> std::shared_ptr record_batch_stream_reader;
> ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, 
> _batch_stream_reader));
> std::shared_ptr record_batch;
> arrow::Status status = 
> record_batch_stream_reader->ReadNext(_batch);
> std::cout << "record_batch->column_name(0): " << 
> record_batch->column_name(0) << std::endl;
> std::cout << "record_batch->num_columns(): " << 
> record_batch->num_columns() << std::endl;
> std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << 
> std::endl;
> std::cout << "record_batch->column(0)->length(): "
>   << record_batch->column(0)->length() << std::endl;
> std::cout << "record_batch->column(0)->ToString(): "
>   << record_batch->column(0)->ToString() << std::endl;
> }
> {code}
> {{record_batch->column(0)->ToString()}} would incur a segmentation fault if 
> retrieving Record Batch is wrapped in a function:
> {code:java}
> std::shared_ptr GetRecordBatchFromPlasma(plasma::ObjectID 
> object_id)
> {
> // Start up and connect a Plasma client.
> plasma::PlasmaClient client;
> ARROW_CHECK_OK(client.Connect("/tmp/store"));
> plasma::ObjectBuffer object_buffer;
> ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer));
> // Retrieve object data.
> auto buffer = object_buffer.data;
> arrow::io::BufferReader buffer_reader(buffer);
> std::shared_ptr record_batch_stream_reader;
> ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, 
> _batch_stream_reader));
> std::shared_ptr record_batch;
> arrow::Status status = 
> record_batch_stream_reader->ReadNext(_batch);
> // Disconnect the client.
> ARROW_CHECK_OK(client.Disconnect());
> return record_batch;
> }
> int main(int argc, char **argv)
> {
> plasma::ObjectID object_id = 
> plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");
> std::shared_ptr record_batch = 
> GetRecordBatchFromPlasma(object_id);
> std::cout << "record_batch->column_name(0): " << 
> record_batch->column_name(0) << std::endl;
> std::cout << "record_batch->num_columns(): " << 
> record_batch->num_columns() << std::endl;
> std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << 
> std::endl;
> std::cout << "record_batch->column(0)->length(): "
>   << record_batch->column(0)->length() << std::endl;
> std::cout << "record_batch->column(0)->ToString(): "
>   << record_batch->column(0)->ToString() << std::endl;
> }
> {code}
> The meta info of the Record Batch such as number of columns and rows is still 
> available, but I can't see the content of the columns.
> {{lldb}} says that the stop reason is {{EXC_BAD_ACCESS}}, so I think the 
> Record Batch is destroyed after {{GetRecordBatchFromPlasma}} finishes. But 
> why can I still see the meta info of this 

[jira] [Resolved] (ARROW-7004) [Plasma] Make it possible to bump up object in LRU cache

2019-10-30 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-7004.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5741
[https://github.com/apache/arrow/pull/5741]

> [Plasma] Make it possible to bump up object in LRU cache
> 
>
> Key: ARROW-7004
> URL: https://issues.apache.org/jira/browse/ARROW-7004
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> To avoid evicting objects too early, we sometimes want to bump up a number of 
> objects up in the LRU cache. While it would be possible to call Get() on 
> these objects, this can be undesirable, since it is blocking on the objects 
> if they are not available and will make it necessary to call Release() on 
> them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7004) [Plasma] Make it possible to bump up object in LRU cache

2019-10-28 Thread Philipp Moritz (Jira)
Philipp Moritz created ARROW-7004:
-

 Summary: [Plasma] Make it possible to bump up object in LRU cache
 Key: ARROW-7004
 URL: https://issues.apache.org/jira/browse/ARROW-7004
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Plasma
Reporter: Philipp Moritz
Assignee: Philipp Moritz


To avoid evicting objects too early, we sometimes want to bump up a number of 
objects up in the LRU cache. While it would be possible to call Get() on these 
objects, this can be undesirable, since it is blocking on the objects if they 
are not available and will make it necessary to call Release() on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6907) [C++][Plasma] Allow Plasma store to batch notifications to clients

2019-10-24 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-6907.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5626
[https://github.com/apache/arrow/pull/5626]

> [C++][Plasma] Allow Plasma store to batch notifications to clients
> --
>
> Key: ARROW-6907
> URL: https://issues.apache.org/jira/browse/ARROW-6907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Danyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6824) [Plasma] Support batched create and seal requests for small objects

2019-10-09 Thread Philipp Moritz (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-6824.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5596
[https://github.com/apache/arrow/pull/5596]

> [Plasma] Support batched create and seal requests for small objects
> ---
>
> Key: ARROW-6824
> URL: https://issues.apache.org/jira/browse/ARROW-6824
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.15.0
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the plasma create API supports creating and sealing a single object 
> – this makes sense for large objects because their creating throughput is 
> limited by the memory throughput of the client when the data is filled into 
> the buffer. However sometimes we want to create lots of small objects in 
> which case the throughput is limited by the number of IPCs to the store we 
> can do when creating new objects. This can be fixed by offering a version of 
> CreateAndSeal that allows us to create multiple objects at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6824) [Plasma] Support batched create and seal requests for small objects

2019-10-09 Thread Philipp Moritz (Jira)
Philipp Moritz created ARROW-6824:
-

 Summary: [Plasma] Support batched create and seal requests for 
small objects
 Key: ARROW-6824
 URL: https://issues.apache.org/jira/browse/ARROW-6824
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Plasma
Affects Versions: 0.15.0
Reporter: Philipp Moritz


Currently the plasma create API supports creating and sealing a single object – 
this makes sense for large objects because their creating throughput is limited 
by the memory throughput of the client when the data is filled into the buffer. 
However sometimes we want to create lots of small objects in which case the 
throughput is limited by the number of IPCs to the store we can do when 
creating new objects. This can be fixed by offering a version of CreateAndSeal 
that allows us to create multiple objects at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation

2019-07-25 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-5955:
-

Assignee: Eric Liang

> [Plasma] Support setting memory quotas per plasma client for better isolation
> -
>
> Key: ARROW-5955
> URL: https://issues.apache.org/jira/browse/ARROW-5955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.14.1
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, plasma evicts objects according a global LRU queue. In Ray, this 
> often causes memory-intensive workloads to fail unpredictably, since a client 
> that creates objects at a high rate can evict objects created by clients at 
> lower rates. This is despite the fact that the true working set of both 
> clients may be quite small.
> cc [~pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation

2019-07-25 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-5955:
--
Component/s: C++ - Plasma

> [Plasma] Support setting memory quotas per plasma client for better isolation
> -
>
> Key: ARROW-5955
> URL: https://issues.apache.org/jira/browse/ARROW-5955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Eric Liang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, plasma evicts objects according a global LRU queue. In Ray, this 
> often causes memory-intensive workloads to fail unpredictably, since a client 
> that creates objects at a high rate can evict objects created by clients at 
> lower rates. This is despite the fact that the true working set of both 
> clients may be quite small.
> cc [~pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation

2019-07-25 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-5955:
--
Affects Version/s: 0.14.1

> [Plasma] Support setting memory quotas per plasma client for better isolation
> -
>
> Key: ARROW-5955
> URL: https://issues.apache.org/jira/browse/ARROW-5955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.14.1
>Reporter: Eric Liang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, plasma evicts objects according a global LRU queue. In Ray, this 
> often causes memory-intensive workloads to fail unpredictably, since a client 
> that creates objects at a high rate can evict objects created by clients at 
> lower rates. This is despite the fact that the true working set of both 
> clients may be quite small.
> cc [~pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation

2019-07-25 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-5955.
---
   Resolution: Fixed
Fix Version/s: 1.0.0

Issue resolved by pull request 4885
[https://github.com/apache/arrow/pull/4885]

> [Plasma] Support setting memory quotas per plasma client for better isolation
> -
>
> Key: ARROW-5955
> URL: https://issues.apache.org/jira/browse/ARROW-5955
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Eric Liang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently, plasma evicts objects according a global LRU queue. In Ray, this 
> often causes memory-intensive workloads to fail unpredictably, since a client 
> that creates objects at a high rate can evict objects created by clients at 
> lower rates. This is despite the fact that the true working set of both 
> clients may be quite small.
> cc [~pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (ARROW-5560) Cannot create Plasma object after OutOfMemory error

2019-07-16 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-5560:
-

Assignee: Richard Liaw

> Cannot create Plasma object after OutOfMemory error
> ---
>
> Key: ARROW-5560
> URL: https://issues.apache.org/jira/browse/ARROW-5560
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Affects Versions: 0.13.0
>Reporter: Stephanie Wang
>Assignee: Richard Liaw
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> If the client tries to call `CreateObject` and there is not enough memory 
> left in the object store to create it, an `OutOfMemory` error will be 
> returned. However, the plasma store also creates an entry for the object, 
> even though it failed to be created. This means that later on, if the client 
> tries to create the object again, it will receive an error that the object 
> already exists. Also, if the client tries to get the object, it will hang 
> because the entry appears to be unsealed.
> We should fix this by only creating the object entry if the `CreateObject` 
> operation succeeds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-5560) Cannot create Plasma object after OutOfMemory error

2019-07-16 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-5560.
---
   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.14.1

Issue resolved by pull request 4850
[https://github.com/apache/arrow/pull/4850]

> Cannot create Plasma object after OutOfMemory error
> ---
>
> Key: ARROW-5560
> URL: https://issues.apache.org/jira/browse/ARROW-5560
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Affects Versions: 0.13.0
>Reporter: Stephanie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> If the client tries to call `CreateObject` and there is not enough memory 
> left in the object store to create it, an `OutOfMemory` error will be 
> returned. However, the plasma store also creates an entry for the object, 
> even though it failed to be created. This means that later on, if the client 
> tries to create the object again, it will receive an error that the object 
> already exists. Also, if the client tries to get the object, it will hang 
> because the entry appears to be unsealed.
> We should fix this by only creating the object entry if the `CreateObject` 
> operation succeeds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-5904:
--
Component/s: C++ - Plasma

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-5904:
-

Assignee: Philipp Moritz

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-5904.
---
   Resolution: Fixed
Fix Version/s: 0.14.1

Issue resolved by pull request 4849
[https://github.com/apache/arrow/pull/4849]

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-11 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883254#comment-16883254
 ] 

Philipp Moritz commented on ARROW-5904:
---

Sounds good, I will prepare a separate PR for this!

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-10 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882562#comment-16882562
 ] 

Philipp Moritz commented on ARROW-5904:
---

We do need a working build configuration that builds both the C++ and Java 
files in order to test this. In Ray we do this with a Bazel based build, which 
I'm happy to upstream and provide Docker files for. Would that help?

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-10 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882490#comment-16882490
 ] 

Philipp Moritz commented on ARROW-5904:
---

Looks like this is not currently tested because of 
https://issues.apache.org/jira/browse/ARROW-4764.

> [Java] [Plasma] Fix compilation of Plasma Java client
> -
>
> Key: ARROW-5904
> URL: https://issues.apache.org/jira/browse/ARROW-5904
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is broken since the introduction of user-defined Status messages:
> {code:java}
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
>  In function '_jobject* 
> Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
> jbyteArray, jint, jbyteArray)':
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
>if (s.IsPlasmaObjectExists()) {
>  ^
> external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
>  error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
>if (s.IsPlasmaStoreFull()) {
>  ^{code}
> [~guoyuhong85] Can you add this codepath to the test so we can catch this 
> kind of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client

2019-07-10 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5904:
-

 Summary: [Java] [Plasma] Fix compilation of Plasma Java client
 Key: ARROW-5904
 URL: https://issues.apache.org/jira/browse/ARROW-5904
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


This is broken since the introduction of user-defined Status messages:
{code:java}
external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:
 In function '_jobject* 
Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, 
jbyteArray, jint, jbyteArray)':
external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9:
 error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists'
   if (s.IsPlasmaObjectExists()) {
 ^
external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9:
 error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull'
   if (s.IsPlasmaStoreFull()) {
 ^{code}
[~guoyuhong85] Can you add this codepath to the test so we can catch this kind 
of breakage more quickly in the future?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5751) [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not found

2019-06-26 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-5751:
--
Priority: Blocker  (was: Major)

> [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not 
> found
> -
>
> Key: ARROW-5751
> URL: https://issues.apache.org/jira/browse/ARROW-5751
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Blocker
>
> I'm afraid while [https://github.com/apache/arrow/pull/4685] fixed the macOS 
> wheels for python 3, but the python 2.7 wheel is still broken (with a 
> different error):
> {code:java}
> ImportError: 
> dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: /usr/local/opt/c-ares/lib/libcares.2.dylib
>   Referenced from: 
> /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib
>   Reason: image not found{code}
> I tried the same hack as in [https://github.com/apache/arrow/pull/4685] for 
> libcares but it doesn't work (removing the .dylib fails one of the earlier 
> build steps). I think the only way to go forward on this is to compile grpc 
> ourselves. My attempt to do this in 
> [https://github.com/apache/arrow/compare/master...pcmoritz:mac-wheels-py2] 
> fails because OpenSSL is not found even though I'm specifying the 
> OPENSSL_ROOT_DIR (see 
> [https://travis-ci.org/pcmoritz/crossbow/builds/550603543]). Let me know if 
> you have any ideas how to fix this!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5751) [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not found

2019-06-26 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5751:
-

 Summary: [Packaging][Python] Python 2.7 wheels broken on macOS: 
libcares.2.dylib not found
 Key: ARROW-5751
 URL: https://issues.apache.org/jira/browse/ARROW-5751
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


I'm afraid while [https://github.com/apache/arrow/pull/4685] fixed the macOS 
wheels for python 3, but the python 2.7 wheel is still broken (with a different 
error):
{code:java}
ImportError: 
dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: /usr/local/opt/c-ares/lib/libcares.2.dylib

  Referenced from: 
/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib

  Reason: image not found{code}
I tried the same hack as in [https://github.com/apache/arrow/pull/4685] for 
libcares but it doesn't work (removing the .dylib fails one of the earlier 
build steps). I think the only way to go forward on this is to compile grpc 
ourselves. My attempt to do this in 
[https://github.com/apache/arrow/compare/master...pcmoritz:mac-wheels-py2] 
fails because OpenSSL is not found even though I'm specifying the 
OPENSSL_ROOT_DIR (see 
[https://travis-ci.org/pcmoritz/crossbow/builds/550603543]). Let me know if you 
have any ideas how to fix this!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5690) [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing

2019-06-23 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870781#comment-16870781
 ] 

Philipp Moritz commented on ARROW-5690:
---

Linking protobuf statically leads to the following error:
{code:java}
ImportError: 
dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: /usr/local/opt/grpc/lib/libgrpc++.dylib

  Referenced from: 
/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib

  Reason: image not found{code}
So we might need to bundle GRPC (but I'm not sure about that). Do we have any 
configurations in the build system where we do that at the moment?

> [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing
> -
>
> Key: ARROW-5690
> URL: https://issues.apache.org/jira/browse/ARROW-5690
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Blocker
> Fix For: 0.14.0
>
>
> If I build macOS arrow wheels with crossbow from the latest master 
> (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing 
> pyarrow gives the following error message:
> {code:java}
> In [1]: import pyarrow                                                        
>                                                                               
>                          
> ---
> ImportError                               Traceback (most recent call last)
>  in 
> > 1 import pyarrow
> ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in 
>      47 import pyarrow.compat as compat
>      48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>      50 from pyarrow.lib import (null, bool_,
>      51                          int8, int16, int32, int64,
> ImportError: 
> dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib
>   Referenced from: 
> /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib
>   Reason: image not found{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5690) [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing

2019-06-23 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-5690:
-

Assignee: Philipp Moritz

> [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing
> -
>
> Key: ARROW-5690
> URL: https://issues.apache.org/jira/browse/ARROW-5690
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Blocker
> Fix For: 0.14.0
>
>
> If I build macOS arrow wheels with crossbow from the latest master 
> (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing 
> pyarrow gives the following error message:
> {code:java}
> In [1]: import pyarrow                                                        
>                                                                               
>                          
> ---
> ImportError                               Traceback (most recent call last)
>  in 
> > 1 import pyarrow
> ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in 
>      47 import pyarrow.compat as compat
>      48
> ---> 49 from pyarrow.lib import cpu_count, set_cpu_count
>      50 from pyarrow.lib import (null, bool_,
>      51                          int8, int16, int32, int64,
> ImportError: 
> dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib
>   Referenced from: 
> /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib
>   Reason: image not found{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5690) [Packaging] macOS wheels broken: libprotobuf.18.dylib missing

2019-06-22 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5690:
-

 Summary: [Packaging] macOS wheels broken: libprotobuf.18.dylib 
missing
 Key: ARROW-5690
 URL: https://issues.apache.org/jira/browse/ARROW-5690
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


If I build macOS arrow wheels with crossbow from the latest master 
(a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing pyarrow 
gives the following error message:
{code:java}
In [1]: import pyarrow                                                          
                                                                                
                     

---

ImportError                               Traceback (most recent call last)

 in 

> 1 import pyarrow




~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in 

     47 import pyarrow.compat as compat

     48

---> 49 from pyarrow.lib import cpu_count, set_cpu_count

     50 from pyarrow.lib import (null, bool_,

     51                          int8, int16, int32, int64,




ImportError: 
dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
 2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib

  Referenced from: 
/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib

  Reason: image not found{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing

2019-06-20 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868898#comment-16868898
 ] 

Philipp Moritz commented on ARROW-5670:
---

Sounds good! The following patch

[https://github.com/apache/arrow/compare/master...pcmoritz:python-urllib]

and installing `pip install requests[security]` fixed it for me.

> [crossbow] mac os python 3.5 wheel failing
> --
>
> Key: ARROW-5670
> URL: https://issues.apache.org/jira/browse/ARROW-5670
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently the macOS python 3.5 is failing with
> {code:java}
> Downloading Apache Thrift from Traceback (most recent call last):
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py",
>  line 1254, in do_open
> h.request(req.get_method(), req.selector, req.data, headers)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1107, in request
> self._send_request(method, url, body, headers)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1152, in _send_request
> self.endheaders(body)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1103, in endheaders
> self._send_output(message_body)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 934, in _send_output
> self.send(msg)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 877, in send
> self.connect()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1261, in connect
> server_hostname=server_hostname)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 385, in wrap_socket
> _context=self)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 760, in __init__
> self.do_handshake()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 996, in do_handshake
> self._sslobj.do_handshake()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 641, in do_handshake
> self._sslobj.do_handshake()
> ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol 
> version (_ssl.c:719){code}
> I've been looking into this error and will try to push a fix (the openssl 
> version that is used with python 3.5 on macos is too old I think).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing

2019-06-20 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868894#comment-16868894
 ] 

Philipp Moritz commented on ARROW-5670:
---

Ah ok great! If you haven't looked into this particular issue I'm happy to take 
it, it can be fixed by replacing urllib with the requests library in one of the 
helper scripts.

> [crossbow] mac os python 3.5 wheel failing
> --
>
> Key: ARROW-5670
> URL: https://issues.apache.org/jira/browse/ARROW-5670
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently the macOS python 3.5 is failing with
> {code:java}
> Downloading Apache Thrift from Traceback (most recent call last):
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py",
>  line 1254, in do_open
> h.request(req.get_method(), req.selector, req.data, headers)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1107, in request
> self._send_request(method, url, body, headers)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1152, in _send_request
> self.endheaders(body)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1103, in endheaders
> self._send_output(message_body)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 934, in _send_output
> self.send(msg)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 877, in send
> self.connect()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
>  line 1261, in connect
> server_hostname=server_hostname)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 385, in wrap_socket
> _context=self)
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 760, in __init__
> self.do_handshake()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 996, in do_handshake
> self._sslobj.do_handshake()
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", 
> line 641, in do_handshake
> self._sslobj.do_handshake()
> ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol 
> version (_ssl.c:719){code}
> I've been looking into this error and will try to push a fix (the openssl 
> version that is used with python 3.5 on macos is too old I think).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5671) [crossbow] mac os python wheels failing

2019-06-20 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5671:
-

 Summary: [crossbow] mac os python wheels failing
 Key: ARROW-5671
 URL: https://issues.apache.org/jira/browse/ARROW-5671
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


The building of (all?) macOS python wheels is currently failing with
{code:java}
Traceback (most recent call last):

  File "", line 3, in 

  File 
"/Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/__init__.py",
 line 49, in 

from pyarrow.lib import cpu_count, set_cpu_count

ImportError: 
dlopen(/Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so,
 2): Library not loaded: @rpath/libarrow_boost_system.dylib

  Referenced from: 
/Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib

  Reason: image not found{code}
Not sure where this was introduced :(



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing

2019-06-20 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5670:
-

 Summary: [crossbow] mac os python 3.5 wheel failing
 Key: ARROW-5670
 URL: https://issues.apache.org/jira/browse/ARROW-5670
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Currently the macOS python 3.5 is failing with
{code:java}
Downloading Apache Thrift from Traceback (most recent call last):
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py",
 line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 1107, in request
self._send_request(method, url, body, headers)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 1152, in _send_request
self.endheaders(body)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 1103, in endheaders
self._send_output(message_body)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 934, in _send_output
self.send(msg)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 877, in send
self.connect()
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py",
 line 1261, in connect
server_hostname=server_hostname)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 
385, in wrap_socket
_context=self)
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 
760, in __init__
self.do_handshake()
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 
996, in do_handshake
self._sslobj.do_handshake()
  File 
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 
641, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version 
(_ssl.c:719){code}
I've been looking into this error and will try to push a fix (the openssl 
version that is used with python 3.5 on macos is too old I think).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5669) [crossbow] manylinux1 wheel building failing

2019-06-20 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5669:
-

 Summary: [crossbow] manylinux1 wheel building failing
 Key: ARROW-5669
 URL: https://issues.apache.org/jira/browse/ARROW-5669
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


I tried to set up a crossbow queue (on 
a0e1fbb9ef51d05a3f28e221cf8c5d4031a50c93), and right now building the 
manylinux1 wheels seems to be failing because of the arrow flight tests:

 
{code:java}
___ test_tls_do_get 
def test_tls_do_get():
"""Try a simple do_get call over TLS."""
table = simple_ints_table()
>   certs = example_tls_certs()
usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:563: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:64: in 
example_tls_certs
"root_cert": read_flight_resource("root-ca.pem"),
usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:48: in 
read_flight_resource
root = resource_root()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
def resource_root():
"""Get the path to the test resources directory."""
if not os.environ.get("ARROW_TEST_DATA"):
>   raise RuntimeError("Test resources not found; set "
   "ARROW_TEST_DATA to /testing")
E   RuntimeError: Test resources not found; set ARROW_TEST_DATA to 
/testing
usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:41: 
RuntimeError{code}
This may have been introduced in 
[https://github.com/apache/arrow/pull/4594|https://github.com/apache/arrow/pull/4594.]

Any thoughts how we should proceed with this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5283) [C++][Plasma] Server crash when creating an aborted object 3 times

2019-05-28 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-5283.
---
Resolution: Fixed

Issue resolved by pull request 4272
[https://github.com/apache/arrow/pull/4272]

> [C++][Plasma] Server crash when creating an aborted object 3 times
> --
>
> Key: ARROW-5283
> URL: https://issues.apache.org/jira/browse/ARROW-5283
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> sequence:
> (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0)
> (2) call PlasmaClient::Release(id_object)
> (3) call PlasmaClient::Abort(id_object)
> (4) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0) // where 
> the id_object is the same as (1)
> (5) call PlasmaClient::Release(id_object)
> (6) call PlasmaClient::Abort(id_object)
> (7) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0) // where 
> the id_object is the same as (1)
> server crash!
> F0508 10:03:09.546859 32587 eviction_policy.cc:27]  Check failed: it == 
> item_map_.end() 
> *** Check failure stack trace: ***
> *** Aborted at 1557280989 (unix time) try "date -d @1557280989" if you are 
> using GNU date ***
> PC: @ 0x7f5403a46428 gsignal
> *** SIGABRT (@0x3e87f4b) received by PID 32587 (TID 0x7f5406950f80) from 
> PID 32587; stack trace: ***
>     @ 0x7f5403dec390 (unknown)
>     @ 0x7f5403a46428 gsignal
>     @ 0x7f5403a4802a abort
>     @ 0x7f5405780f69 google::logging_fail()
>     @ 0x7f5405782a3d google::LogMessage::Fail()
>     @ 0x7f5405785054 google::LogMessage::SendToLog()
>     @ 0x7f540578255b google::LogMessage::Flush()
>     @ 0x7f5405782779 google::LogMessage::~LogMessage()
>     @ 0x7f54053f98bd arrow::util::ArrowLog::~ArrowLog()
>     @   0x4afcae plasma::LRUCache::Add()
>     @   0x4b00f1 plasma::EvictionPolicy::ObjectCreated()
>     @   0x4b61e0 plasma::PlasmaStore::CreateObject()
>     @   0x4babcc plasma::PlasmaStore::ProcessMessage()
>     @   0x4b95c3 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi
>     @   0x4bdb80 
> _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi
>     @   0x4aba58 std::function<>::operator()()
>     @   0x4aaf67 plasma::EventLoop::FileEventCallback()
>     @   0x4dc1bd aeProcessEvents
>     @   0x4dc37e aeMain
>     @   0x4ab25b plasma::EventLoop::Start()
>     @   0x4c00c1 plasma::PlasmaStoreRunner::Start()
>     @   0x4bc77b plasma::StartServer()
>     @   0x4bd3eb main
>     @ 0x7f5403a31830 __libc_start_main
>     @   0x49e9f9 _start
>     @    0x0 (unknown)
> Aborted (core dumped)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5186) [Plasma] Crash on deleting CUDA memory

2019-04-29 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-5186.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4177
[https://github.com/apache/arrow/pull/4177]

> [Plasma] Crash on deleting CUDA memory
> --
>
> Key: ARROW-5186
> URL: https://issues.apache.org/jira/browse/ARROW-5186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> [sample sequence]
> (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 1) // where 
> device_num != 0
> (2) call PlasmaClient::Seal(id_object)
> (3) call PlasmaClient::Release(id_object)
> (4) call PlasmaClient::Delete(id_object) // server carsh!
> *** Aborted at 1555645923 (unix time) try "date -d @1555645923" if you are 
> using GNU date ***
> PC: @ 0x7f65bcfa1428 gsignal
> *** SIGABRT (@0x3e86d67) received by PID 28007 (TID 0x7f65bf225740) from 
> PID 28007; stack trace: ***
>     @ 0x7f65bd347390 (unknown)
>     @ 0x7f65bcfa1428 gsignal
>     @ 0x7f65bcfa302a abort
>     @   0x4a56cd dlfree
>     @   0x4b4bc2 plasma::PlasmaAllocator::Free()
>     @   0x4b7da3 plasma::PlasmaStore::EraseFromObjectTable()
>     @   0x4b87d2 plasma::PlasmaStore::DeleteObject()
>     @   0x4bb3d2 plasma::PlasmaStore::ProcessMessage()
>     @   0x4b9195 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi
>     @   0x4bd752 
> _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi
>     @   0x4ab998 std::function<>::operator()()
>     @   0x4aaea7 plasma::EventLoop::FileEventCallback()
>     @   0x4dbd8f aeProcessEvents
>     @   0x4dbf50 aeMain
>     @   0x4ab19b plasma::EventLoop::Start()
>     @   0x4bfc93 plasma::PlasmaStoreRunner::Start()
>     @   0x4bc34d plasma::StartServer()
>     @   0x4bcfbd main
>     @ 0x7f65bcf8c830 __libc_start_main
>     @   0x49e939 _start
>     @    0x0 (unknown)
> Aborted (core dumped)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5027) [Python] Add JSON Reader

2019-03-27 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5027:
-

 Summary: [Python] Add JSON Reader
 Key: ARROW-5027
 URL: https://issues.apache.org/jira/browse/ARROW-5027
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Philipp Moritz


Add bindings for the JSON reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4939) [Python] Add wrapper for "sum" kernel

2019-03-27 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4939.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3954
[https://github.com/apache/arrow/pull/3954]

> [Python] Add wrapper for "sum" kernel
> -
>
> Key: ARROW-4939
> URL: https://issues.apache.org/jira/browse/ARROW-4939
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add pyarrow wrappers for the sum compute kernel.
> For this we also need to add wrappers for the new arrow::Scalar types and 
> appropriate conversions from Datum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5022) [C++] Implement more "Datum" types for AggregateKernel

2019-03-26 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5022:
-

 Summary: [C++] Implement more "Datum" types for AggregateKernel
 Key: ARROW-5022
 URL: https://issues.apache.org/jira/browse/ARROW-5022
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Currently it gives the following error if the datum isn't an array:
{code:java}
AggregateKernel expects Array datum{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5002) [C++] Implement GroupBy

2019-03-24 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-5002:
-

 Summary: [C++] Implement GroupBy
 Key: ARROW-5002
 URL: https://issues.apache.org/jira/browse/ARROW-5002
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Dear all,

I wonder what the best way forward is for implementing GroupBy kernels. 
Initially this was part of

https://issues.apache.org/jira/browse/ARROW-4124

but is not contained in the current implementation as far as I can tell.

It seems that the part of group by that just returns indices could be 
conveniently implemented with the HashKernel. That seems useful in any case. Is 
that indeed the best way forward/should this be done?

GroupBy + Aggregate could then either be implemented with that + the Take 
kernel + aggregation involving more memory copies than necessary though or as 
part of the aggregate kernel. Probably the latter is preferred, any thoughts on 
that?

Am I missing any other JIRAs related to this?

Best, Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4983) [Plasma] Unmap memory when the client is destroyed

2019-03-21 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4983:
-

 Summary: [Plasma] Unmap memory when the client is destroyed
 Key: ARROW-4983
 URL: https://issues.apache.org/jira/browse/ARROW-4983
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Plasma
Affects Versions: 0.12.1
Reporter: Philipp Moritz
Assignee: Philipp Moritz


Currently the plasma memory mapped into the client is not unmapped upon 
destruction of the client, which can cause memory mapped files to be kept 
around longer than necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4958) [C++] Purely static linking broken

2019-03-18 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4958:
--
Component/s: C++

> [C++] Purely static linking broken
> --
>
> Key: ARROW-4958
> URL: https://issues.apache.org/jira/browse/ARROW-4958
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Priority: Major
>
> On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), 
> running
>  
> {code:java}
> docker-compose build cpp
> docker-compose run cpp-static-only{code}
> yields
> {code:java}
> [357/382] Linking CXX executable debug/parquet-encoding-benchmark
> FAILED: debug/parquet-encoding-benchmark
> : && /opt/conda/bin/ccache /usr/bin/g++  -Wno-noexcept-type  
> -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
> -Wno-sign-conversion -Werror -msse4.2  -g  -rdynamic 
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o 
>  -o debug/parquet-encoding-benchmark  -Wl,-rpath,/opt/conda/lib 
> /opt/conda/lib/libbenchmark_main.a debug/libparquet.a 
> /opt/conda/lib/libbenchmark.a debug/libarrow.a 
> /opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so 
> /opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so 
> /opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so 
> /opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so 
> /opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a 
> /opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so 
> /opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so 
> jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt 
> /opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && :
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::AssertionResult::AppendMessage(testing::Message 
> const&)':
> /opt/conda/include/gtest/gtest.h:352: undefined reference to 
> `testing::Message::GetString[abi:cxx11]() const'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `parquet::BenchmarkDecodeArrow::InitDataInputs()':
> /arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to 
> `arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, 
> double)'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `parquet::BM_DictDecodingByteArray::DoEncodeData()':
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AlwaysTrue()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AlwaysTrue()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::Message::Message()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
> char const*, int, char const*)'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::operator=(testing::Message const&) const'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::Message::Message()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
> char const*, int, char const*)'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::operator=(testing::Message const&) const'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::internal::scoped_ptr std::char_traits, std::allocator > 
> >::reset(std::__cxx11::basic_string, 
> std::allocator >*)':
> /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to 
> `testing::internal::IsTrue(bool)'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::AssertionResult 
> testing::internal::CmpHelperNE
>  >*, decltype(nullptr)>(char const*, char const*, 
> parquet::DictEncoder >* const&, 
> decltype(nullptr) const&)':
> /opt/conda/include/gtest/gtest.h:1573: undefined 

[jira] [Updated] (ARROW-4958) [C++] Purely static linking broken

2019-03-18 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4958:
--
Affects Version/s: 0.12.1

> [C++] Purely static linking broken
> --
>
> Key: ARROW-4958
> URL: https://issues.apache.org/jira/browse/ARROW-4958
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Priority: Major
>
> On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), 
> running
>  
> {code:java}
> docker-compose build cpp
> docker-compose run cpp-static-only{code}
> yields
> {code:java}
> [357/382] Linking CXX executable debug/parquet-encoding-benchmark
> FAILED: debug/parquet-encoding-benchmark
> : && /opt/conda/bin/ccache /usr/bin/g++  -Wno-noexcept-type  
> -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
> -Wno-sign-conversion -Werror -msse4.2  -g  -rdynamic 
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o 
>  -o debug/parquet-encoding-benchmark  -Wl,-rpath,/opt/conda/lib 
> /opt/conda/lib/libbenchmark_main.a debug/libparquet.a 
> /opt/conda/lib/libbenchmark.a debug/libarrow.a 
> /opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so 
> /opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so 
> /opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so 
> /opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so 
> /opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a 
> /opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so 
> /opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so 
> jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt 
> /opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && :
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::AssertionResult::AppendMessage(testing::Message 
> const&)':
> /opt/conda/include/gtest/gtest.h:352: undefined reference to 
> `testing::Message::GetString[abi:cxx11]() const'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `parquet::BenchmarkDecodeArrow::InitDataInputs()':
> /arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to 
> `arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, 
> double)'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `parquet::BM_DictDecodingByteArray::DoEncodeData()':
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AlwaysTrue()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AlwaysTrue()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::Message::Message()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
> char const*, int, char const*)'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::operator=(testing::Message const&) const'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::Message::Message()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
> char const*, int, char const*)'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::operator=(testing::Message const&) const'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
> `testing::internal::AssertHelper::~AssertHelper()'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::internal::scoped_ptr std::char_traits, std::allocator > 
> >::reset(std::__cxx11::basic_string, 
> std::allocator >*)':
> /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to 
> `testing::internal::IsTrue(bool)'
> src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o:
>  In function `testing::AssertionResult 
> testing::internal::CmpHelperNE
>  >*, decltype(nullptr)>(char const*, char const*, 
> parquet::DictEncoder >* const&, 
> decltype(nullptr) const&)':
> /opt/conda/include/gtest/gtest.h:1573: undefined reference to 
> 

[jira] [Created] (ARROW-4958) [C++] Purely static linking broken

2019-03-18 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4958:
-

 Summary: [C++] Purely static linking broken
 Key: ARROW-4958
 URL: https://issues.apache.org/jira/browse/ARROW-4958
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), running

 
{code:java}
docker-compose build cpp
docker-compose run cpp-static-only{code}
yields
{code:java}
[357/382] Linking CXX executable debug/parquet-encoding-benchmark

FAILED: debug/parquet-encoding-benchmark

: && /opt/conda/bin/ccache /usr/bin/g++  -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Werror -msse4.2  -g  -rdynamic 
src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o  
-o debug/parquet-encoding-benchmark  -Wl,-rpath,/opt/conda/lib 
/opt/conda/lib/libbenchmark_main.a debug/libparquet.a 
/opt/conda/lib/libbenchmark.a debug/libarrow.a 
/opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so 
/opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so 
/opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so 
/opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so 
/opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a 
/opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so 
/opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so 
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt 
/opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && :

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function `testing::AssertionResult::AppendMessage(testing::Message const&)':

/opt/conda/include/gtest/gtest.h:352: undefined reference to 
`testing::Message::GetString[abi:cxx11]() const'

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function `parquet::BenchmarkDecodeArrow::InitDataInputs()':

/arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to 
`arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, 
double)'

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function `parquet::BM_DictDecodingByteArray::DoEncodeData()':

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AlwaysTrue()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AlwaysTrue()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::Message::Message()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
char const*, int, char const*)'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AssertHelper::operator=(testing::Message const&) const'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AssertHelper::~AssertHelper()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
`testing::Message::Message()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
`testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, 
char const*, int, char const*)'

/arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
`testing::internal::AssertHelper::operator=(testing::Message const&) const'

/arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
`testing::internal::AssertHelper::~AssertHelper()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to 
`testing::internal::AssertHelper::~AssertHelper()'

/arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to 
`testing::internal::AssertHelper::~AssertHelper()'

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function `testing::internal::scoped_ptr, std::allocator > 
>::reset(std::__cxx11::basic_string, 
std::allocator >*)':

/opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to 
`testing::internal::IsTrue(bool)'

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function `testing::AssertionResult 
testing::internal::CmpHelperNE
 >*, decltype(nullptr)>(char const*, char const*, 
parquet::DictEncoder >* const&, 
decltype(nullptr) const&)':

/opt/conda/include/gtest/gtest.h:1573: undefined reference to 
`testing::AssertionSuccess()'

src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: 
In function 
`testing::internal::scoped_ptr, std::allocator > 
>::reset(std::__cxx11::basic_stringstream, 
std::allocator >*)':

/opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to 
`testing::internal::IsTrue(bool)'


[jira] [Created] (ARROW-4939) [Python] Add wrapper for "sum" kernel

2019-03-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4939:
-

 Summary: [Python] Add wrapper for "sum" kernel
 Key: ARROW-4939
 URL: https://issues.apache.org/jira/browse/ARROW-4939
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.12.1
Reporter: Philipp Moritz
Assignee: Philipp Moritz


Add pyarrow wrappers for the sum compute kernel.

For this we also need to add wrappers for the new arrow::Scalar types and 
appropriate conversions from Datum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader

2019-03-15 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4912:
--
Affects Version/s: 0.12.1

> [C++, Python] Allow specifying column names to CSV reader
> -
>
> Key: ARROW-4912
> URL: https://issues.apache.org/jira/browse/ARROW-4912
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently I think there is no way to specify custom column names for CSV 
> files. It's possible to specify the full schema of the file, but not just 
> column names.
> See the related discussion here: ARROW-3722
> The goal of this is to re-use the CSV type-inference but still allow people 
> to specify custom names for the columns. As far as I know, there is currently 
> no way to set column names post-hoc, so we should provide a way to specify 
> them before reading the file.
> Related to this, ParseOptions(header_rows=0) is not currently implemented.
> Is there any current way to do this or does this need to be implmented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader

2019-03-15 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4912:
--
Component/s: Python
 C++

> [C++, Python] Allow specifying column names to CSV reader
> -
>
> Key: ARROW-4912
> URL: https://issues.apache.org/jira/browse/ARROW-4912
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently I think there is no way to specify custom column names for CSV 
> files. It's possible to specify the full schema of the file, but not just 
> column names.
> See the related discussion here: ARROW-3722
> The goal of this is to re-use the CSV type-inference but still allow people 
> to specify custom names for the columns. As far as I know, there is currently 
> no way to set column names post-hoc, so we should provide a way to specify 
> them before reading the file.
> Related to this, ParseOptions(header_rows=0) is not currently implemented.
> Is there any current way to do this or does this need to be implmented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader

2019-03-15 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4912:
-

 Summary: [C++, Python] Allow specifying column names to CSV reader
 Key: ARROW-4912
 URL: https://issues.apache.org/jira/browse/ARROW-4912
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Currently I think there is no way to specify custom column names for CSV files. 
It's possible to specify the full schema of the file, but not just column names.

See the related discussion here: ARROW-3722

The goal of this is to re-use the CSV type-inference but still allow people to 
specify custom names for the columns. As far as I know, there is currently no 
way to set column names post-hoc, so we should provide a way to specify them 
before reading the file.

Related to this, ParseOptions(header_rows=0) is not currently implemented.

Is there any current way to do this or does this need to be implmented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4810) [Format][C++] Add "LargeList" type with 64-bit offsets

2019-03-10 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789169#comment-16789169
 ] 

Philipp Moritz commented on ARROW-4810:
---

I agree that we should have a reference implementation in both C++ and Java. Is 
there anybody who can help with this? I don't personally have the expertise to 
do it unfortunately (and would also like to avoid this becoming a zombie 
project like https://issues.apache.org/jira/browse/ARROW-1692).

> [Format][C++] Add "LargeList" type with 64-bit offsets
> --
>
> Key: ARROW-4810
> URL: https://issues.apache.org/jira/browse/ARROW-4810
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Mentioned in https://github.com/apache/arrow/issues/3845



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4757) [C++] Nested chunked array support

2019-03-07 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787476#comment-16787476
 ] 

Philipp Moritz commented on ARROW-4757:
---

Yeah, that's a good idea, I'll give that a shot. I started making ChunkedArray 
a subclass of Array and introduce a type so it can be serialized via the IPC 
mechanism, but it is a little clumsy.

> [C++] Nested chunked array support
> --
>
> Key: ARROW-4757
> URL: https://issues.apache.org/jira/browse/ARROW-4757
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
> Fix For: 0.14.0
>
>
> Dear all,
> I'm currently trying to lift the 2GB limit on the python serialization. For 
> this, I implemented a chunked union builder to split the array into smaller 
> arrays.
> However, some of the children of the union array can be ListArrays, which can 
> themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit 
> of a loss how to handle this. In principle I'd like to chunk the children 
> too. However, currently UnionArrays can only have children of type Array, and 
> there is no way to treat a chunked array (which is a vector of Arrays) as an 
> Array to store it as a child of a UnionArray. Any ideas how to best support 
> this use case?
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available

2019-03-07 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4797:
-

 Summary: [Plasma] Avoid store crash if not enough memory is 
available
 Key: ARROW-4797
 URL: https://issues.apache.org/jira/browse/ARROW-4797
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Currently, the plasma server exists with a fatal check if not enough memory is 
available. This can lead to errors that are hard to diagnose, see

[https://github.com/ray-project/ray/issues/3670]

Instead, we should keep the store alive in these circumstances, taking up some 
of the remaining memory and allow the client to check if enough memory has been 
allocating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available

2019-03-07 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4797:
--
Component/s: C++ - Plasma

> [Plasma] Avoid store crash if not enough memory is available
> 
>
> Key: ARROW-4797
> URL: https://issues.apache.org/jira/browse/ARROW-4797
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Priority: Major
>
> Currently, the plasma server exists with a fatal check if not enough memory 
> is available. This can lead to errors that are hard to diagnose, see
> [https://github.com/ray-project/ray/issues/3670]
> Instead, we should keep the store alive in these circumstances, taking up 
> some of the remaining memory and allow the client to check if enough memory 
> has been allocating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available

2019-03-07 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-4797:
-

Assignee: Philipp Moritz

> [Plasma] Avoid store crash if not enough memory is available
> 
>
> Key: ARROW-4797
> URL: https://issues.apache.org/jira/browse/ARROW-4797
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>
> Currently, the plasma server exists with a fatal check if not enough memory 
> is available. This can lead to errors that are hard to diagnose, see
> [https://github.com/ray-project/ray/issues/3670]
> Instead, we should keep the store alive in these circumstances, taking up 
> some of the remaining memory and allow the client to check if enough memory 
> has been allocating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4757) Nested chunked array support

2019-03-04 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783141#comment-16783141
 ] 

Philipp Moritz commented on ARROW-4757:
---

One possible way to solve this is to make ChunkedArray a first class citizen, 
i.e. make it a subclass of Array and allow it to participate in IPC. Then the 
UnionArray could just have a ChunkedArray as a child to solve the above issue.

> Nested chunked array support
> 
>
> Key: ARROW-4757
> URL: https://issues.apache.org/jira/browse/ARROW-4757
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> Dear all,
> I'm currently trying to lift the 2GB limit on the python serialization. For 
> this, I implemented a chunked union builder to split the array into smaller 
> arrays.
> However, some of the children of the union array can be ListArrays, which can 
> themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit 
> of a loss how to handle this. In principle I'd like to chunk the children 
> too. However, currently UnionArrays can only have children of type Array, and 
> there is no way to treat a chunked array (which is a vector of Arrays) as an 
> Array to store it as a child of a UnionArray. Any ideas how to best support 
> this use case?
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4757) Nested chunked array support

2019-03-04 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4757:
-

 Summary: Nested chunked array support
 Key: ARROW-4757
 URL: https://issues.apache.org/jira/browse/ARROW-4757
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Dear all,

I'm currently trying to lift the 2GB limit on the python serialization. For 
this, I implemented a chunked union builder to split the array into smaller 
arrays.

However, some of the children of the union array can be ListArrays, which can 
themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit 
of a loss how to handle this. In principle I'd like to chunk the children too. 
However, currently UnionArrays can only have children of type Array, and there 
is no way to treat a chunked array (which is a vector of Arrays) as an Array to 
store it as a child of a UnionArray. Any ideas how to best support this use 
case?

-- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4690) Building TensorFlow compatible wheels for Arrow

2019-02-26 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4690:
-

 Summary: Building TensorFlow compatible wheels for Arrow
 Key: ARROW-4690
 URL: https://issues.apache.org/jira/browse/ARROW-4690
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Since the inclusion of LLVM, arrow wheels stopped working with TensorFlow again 
(on some configurations at least).

While we are continuing to discuss a more permanent solution in 
[https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion|https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion,],
 I made some progress in creating tensorflow compatible wheels for an 
unmodified pyarrow.

They won't adhere to the manylinux1 standard, but they should be as compatible 
as the TensorFlow wheels because they use the same build environment (ubuntu 
14.04).

I'll create a PR with the necessary changes. I don't propose to ship these 
wheels but it might be a good idea to include the docker image and instructions 
how to build them in the tree for organizations that want to use tensorflow 
with pyarrow on top of pip. The official recommendation should probably be to 
use conda if the average user wants to do this for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4491) [Python] Remove usage of std::to_string and std::stoi

2019-02-05 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761501#comment-16761501
 ] 

Philipp Moritz commented on ARROW-4491:
---

Ok, I think I understand this now. On some implementations, int8_t seems to be 
a typedef to char and the conversion in this case produces a character and not 
a number.

> [Python] Remove usage of std::to_string and std::stoi
> -
>
> Key: ARROW-4491
> URL: https://issues.apache.org/jira/browse/ARROW-4491
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Not sure why this is happening, but for some older compilers I'm seeing
> {code:java}
> terminate called after throwing an instance of 'std::invalid_argument'
>   what():  stoi{code}
> since 
> [https://github.com/apache/arrow/pull/3423|https://github.com/apache/arrow/pull/3423.]
> Possible cause is that there is no int8_t version of 
> [https://en.cppreference.com/w/cpp/string/basic_string/to_string|https://en.cppreference.com/w/cpp/string/basic_string/to_string,]
>  so it might not convert it to a proper string representation of the number.
> Any insight on why this could be happening is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1266) [Plasma] Move heap allocations to arrow memory pool

2019-02-05 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761440#comment-16761440
 ] 

Philipp Moritz commented on ARROW-1266:
---

At a quick glance, the only structure that is still allocated with new is 
GetRequest here: 
[https://github.com/apache/arrow/blob/0c55b25c84119af59320eab0b0625da9ce987294/cpp/src/plasma/store.cc#L404]

The control flow is pretty clear and it is deleted here: 
[https://github.com/apache/arrow/blob/0c55b25c84119af59320eab0b0625da9ce987294/cpp/src/plasma/store.cc#L296]

However if somebody wants to make it a unique_ptr or shared_ptr ([~suquark]?) I 
wouldn't mind.

> [Plasma] Move heap allocations to arrow memory pool
> ---
>
> Key: ARROW-1266
> URL: https://issues.apache.org/jira/browse/ARROW-1266
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
> Fix For: 0.13.0
>
>
> At the moment we are allocating memory with std::vectors and even new in some 
> places, this should be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4491) [Python] Remove usage of std::to_string and std::stoi

2019-02-05 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4491:
-

 Summary: [Python] Remove usage of std::to_string and std::stoi
 Key: ARROW-4491
 URL: https://issues.apache.org/jira/browse/ARROW-4491
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Not sure why this is happening, but for some older compilers I'm seeing
{code:java}
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi{code}
since 
[https://github.com/apache/arrow/pull/3423|https://github.com/apache/arrow/pull/3423.]

Possible cause is that there is no int8_t version of 
[https://en.cppreference.com/w/cpp/string/basic_string/to_string|https://en.cppreference.com/w/cpp/string/basic_string/to_string,]
 so it might not convert it to a proper string representation of the number.

Any insight on why this could be happening is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4475) [Python] Serializing objects that contain themselves

2019-02-04 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4475:
-

 Summary: [Python] Serializing objects that contain themselves
 Key: ARROW-4475
 URL: https://issues.apache.org/jira/browse/ARROW-4475
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


This is a regression from [https://github.com/apache/arrow/pull/3423]

The following segfaults:
{code:java}
import pyarrow as pa
lst = []
lst.append(lst)
pa.serialize(lst){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4455) [Plasma] g++ 8 reports class-memaccess warnings

2019-02-01 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4455.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3543
[https://github.com/apache/arrow/pull/3543]

> [Plasma] g++ 8 reports class-memaccess warnings
> ---
>
> Key: ARROW-4455
> URL: https://issues.apache.org/jira/browse/ARROW-4455
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4452) [Python] Serializing sparse torch tensors

2019-02-01 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4452:
--
Description: 
Using the pytorch serialization handler on sparse Tensors:
{code:java}
import torch
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3,      4,      5    ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))

pyarrow.serialization.register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context)

s = pyarrow.serialize(tensor, 
context=pyarrow.serialization._default_serialization_context) {code}
Produces this result:
{code:java}
TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to 
convert to a dense tensor first.{code}
We should provide a way to serialize sparse torch tensors, especially now that 
we are getting support for sparse Tensors.

  was:
Using the pytorch serialization handler on sparse Tensors:
{code:java}
import torch
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3,      4,      5    ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))

register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context)

s = pyarrow.serialize(tensor, 
context=pyarrow.serialization._default_serialization_context) {code}
Produces this result:
{code:java}
TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to 
convert to a dense tensor first.{code}
We should provide a way to serialize sparse torch tensors, especially now that 
we are getting support for sparse Tensors.


> [Python] Serializing sparse torch tensors
> -
>
> Key: ARROW-4452
> URL: https://issues.apache.org/jira/browse/ARROW-4452
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> Using the pytorch serialization handler on sparse Tensors:
> {code:java}
> import torch
> i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
> v = torch.FloatTensor([3,      4,      5    ])
> tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))
> pyarrow.serialization.register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context)
> s = pyarrow.serialize(tensor, 
> context=pyarrow.serialization._default_serialization_context) {code}
> Produces this result:
> {code:java}
> TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to 
> convert to a dense tensor first.{code}
> We should provide a way to serialize sparse torch tensors, especially now 
> that we are getting support for sparse Tensors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4452) [Python] Serializing sparse torch tensors

2019-02-01 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4452:
-

 Summary: [Python] Serializing sparse torch tensors
 Key: ARROW-4452
 URL: https://issues.apache.org/jira/browse/ARROW-4452
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Using the pytorch serialization handler on sparse Tensors:
{code:java}
import torch
i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
v = torch.FloatTensor([3,      4,      5    ])
tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3]))

register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context)

s = pyarrow.serialize(tensor, 
context=pyarrow.serialization._default_serialization_context) {code}
Produces this result:
{code:java}
TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to 
convert to a dense tensor first.{code}
We should provide a way to serialize sparse torch tensors, especially now that 
we are getting support for sparse Tensors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit

2019-01-30 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4422.
---
Resolution: Fixed

Issue resolved by pull request 3526
[https://github.com/apache/arrow/pull/3526]

> [Plasma] Enforce memory limit in plasma, rather than relying on 
> dlmalloc_set_footprint_limit
> 
>
> Key: ARROW-4422
> URL: https://issues.apache.org/jira/browse/ARROW-4422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Plasma (C++)
>Affects Versions: 0.12.0
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
> Fix For: 0.13.0
>
>
> Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory 
> utilization for Plasma Store. This is restrictive because:
>  * It restricts Plasma to dlmalloc, which supports limiting memory footprint, 
> as opposed to other, potentially more performant malloc implementations 
> (e.g., jemalloc)
>  * dlmalloc_set_footprint_limit does not guarantee that the limit set by it 
> the amount of _usable_ memory. As such, we might trigger evictions much 
> earlier than hitting this limit, e.g., due to fragmentation or metadata 
> overheads.
> To overcome this, we can impose the memory limit at Plasma by tracking the 
> number of bytes allocated and freed using malloc and free calls. Whenever the 
> allocation reaches the set limit, we fail any subsequent allocations (i.e., 
> return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and 
> also provides more accurate tracking of memory allocation/capacity. 
> Caveat: We will need to make sure that the mmaped files are living on a file 
> system that is a bit larger (depending on malloc implementation) than the 
> Plasma memory limit to account for the extra memory required due to 
> fragmentation/metadata overheads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4379) Register pyarrow serializers for collections.Counter and collections.deque.

2019-01-27 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-4379:
-

Assignee: Robert Nishihara

> Register pyarrow serializers for collections.Counter and collections.deque.
> ---
>
> Key: ARROW-4379
> URL: https://issues.apache.org/jira/browse/ARROW-4379
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4379) Register pyarrow serializers for collections.Counter and collections.deque.

2019-01-27 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4379.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3489
[https://github.com/apache/arrow/pull/3489]

> Register pyarrow serializers for collections.Counter and collections.deque.
> ---
>
> Key: ARROW-4379
> URL: https://issues.apache.org/jira/browse/ARROW-4379
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4378) [Plasma] Release objects upon Create

2019-01-25 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4378:
-

 Summary: [Plasma] Release objects upon Create
 Key: ARROW-4378
 URL: https://issues.apache.org/jira/browse/ARROW-4378
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.13.0
Reporter: Philipp Moritz


Similar to the way that
{code:java}
Get(const std::vector& object_ids, int64_t timeout_ms, 
std::vector* out){code}
 releases the object when the shared_ptr inside of ObjectBuffer 
goes out of scope, the same should happen for
{code}
  Status Create(const ObjectID& object_id, int64_t data_size, const uint8_t* 
metadata,
int64_t metadata_size, std::shared_ptr* data);
{code}
At the moment people have to remember calling Release() after they created and 
sealed the object and that can make the use of the C++ API cumbersome.

Thanks to [~anuragkh] for reporting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4236) [JAVA] Distinct plasma client create exceptions

2019-01-23 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4236.
---
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3306
[https://github.com/apache/arrow/pull/3306]

> [JAVA] Distinct plasma client create exceptions
> ---
>
> Key: ARROW-4236
> URL: https://issues.apache.org/jira/browse/ARROW-4236
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Yuhong Guo
>Assignee: Lin Yuan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> [PR|https://github.com/apache/arrow/pull/3306]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4319) plasma/store.h pulls ins flatbuffer dependency

2019-01-22 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748600#comment-16748600
 ] 

Philipp Moritz commented on ARROW-4319:
---

Hey Matthias,

It should be possible to remove this dependency by shuffling around/forward 
declaring a few things. Do you want to submit a PR? Let me know if you run into 
any issues that require a deeper refactor.

-- Philipp.

> plasma/store.h pulls ins flatbuffer dependency
> --
>
> Key: ARROW-4319
> URL: https://issues.apache.org/jira/browse/ARROW-4319
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Matthias Vallentin
>Priority: Minor
>  Labels: build
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For our unit testing, we'd like to use the plasma store programmatically by 
> including *plasma/store.h*. However, this header pulls in flatbuffers via 
> *src/plasma/common_generated.h*. Is this a necessary include or would a 
> forward declaration suffice?
> Installing just flatbuffers didn't solve the problem, though. It looks like a 
> specific version is needed:
> {noformat}
>  In file included from /Users/mavam/code/src/arrow/cpp/src/plasma/store.h:30:
>  In file included from 
> /Users/mavam/code/src/arrow/cpp/src/plasma/eviction_policy.h:27:
>  In file included from /Users/mavam/code/src/arrow/cpp/src/plasma/plasma.h:41:
>  /Users/mavam/code/src/arrow/cpp/src/plasma/common_generated.h:65:21: error: 
> no matching member function for call to 'Verify'
>  verifier.Verify(object_id()) &&
>  ~^~
>  /usr/local/include/flatbuffers/flatbuffers.h:1896:29: note: candidate 
> template ignored: couldn't infer template argument 'T'
>  template bool Verify(size_t elem) const {
>  ^
>  /usr/local/include/flatbuffers/flatbuffers.h:1905:29: note: candidate 
> function template not viable: requires 2 arguments, but 1 was provided
>  template bool Verify(const uint8_t *base, voffset_t elem_off)
>  ^
>  /usr/local/include/flatbuffers/flatbuffers.h:1880:8: note: candidate 
> function not viable: requires 2 arguments, but 1 was provided
>  bool Verify(size_t elem, size_t elem_len) const {
>  ^
>  /usr/local/include/flatbuffers/flatbuffers.h:1901:8: note: candidate 
> function not viable: requires 3 arguments, but 1 was provided
>  bool Verify(const uint8_t *base, voffset_t elem_off, size_t elem_len) const {
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-19 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-4295:
-

Assignee: Anurag Khandelwal

> [Plasma] Incorrect log message when evicting objects
> 
>
> Key: ARROW-4295
> URL: https://issues.apache.org/jira/browse/ARROW-4295
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When Plasma evicts objects on running out of memory, it prints log messages 
> of the form:
> {quote}There is not enough space to create this object, so evicting x objects 
> to free up y bytes. The number of bytes in use (before this eviction) is 
> z.{quote}
> However, the reported number of bytes in use (before this eviction) actually 
> reports the number of bytes *after* the eviction. A straightforward fix is to 
> simply replace z with (y+z).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-19 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4295.
---
Resolution: Fixed

Issue resolved by pull request 3433
[https://github.com/apache/arrow/pull/3433]

> [Plasma] Incorrect log message when evicting objects
> 
>
> Key: ARROW-4295
> URL: https://issues.apache.org/jira/browse/ARROW-4295
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When Plasma evicts objects on running out of memory, it prints log messages 
> of the form:
> {quote}There is not enough space to create this object, so evicting x objects 
> to free up y bytes. The number of bytes in use (before this eviction) is 
> z.{quote}
> However, the reported number of bytes in use (before this eviction) actually 
> reports the number of bytes *after* the eviction. A straightforward fix is to 
> simply replace z with (y+z).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4285) [Python] Use proper builder interface for serialization

2019-01-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4285:
-

 Summary: [Python] Use proper builder interface for serialization
 Key: ARROW-4285
 URL: https://issues.apache.org/jira/browse/ARROW-4285
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.12.0
Reporter: Philipp Moritz


As a preparation for ARROW-3919, refactor the python serialization code such 
that the default builder interface is used. In the next step we can then plug 
in ChunkedBuilders to make sure that the generated arrays are properly chunked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4269) [Python] AttributeError: module 'pandas.core' has no attribute 'arrays'

2019-01-15 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4269:
-

 Summary: [Python] AttributeError: module 'pandas.core' has no 
attribute 'arrays'
 Key: ARROW-4269
 URL: https://issues.apache.org/jira/browse/ARROW-4269
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


This happens with pandas 0.22:

```

In [1]: import pyarrow
---
AttributeError Traceback (most recent call last)
 in ()
> 1 import pyarrow

~/arrow/python/pyarrow/__init__.py in ()
 174 localfs = LocalFileSystem.get_instance()
 175 
--> 176 from pyarrow.serialization import (default_serialization_context,
 177 register_default_serialization_handlers,
 178 register_torch_serialization_handlers)

~/arrow/python/pyarrow/serialization.py in ()
 303 
 304 
--> 305 register_default_serialization_handlers(_default_serialization_context)

~/arrow/python/pyarrow/serialization.py in 
register_default_serialization_handlers(serialization_context)
 294 custom_deserializer=_deserialize_pyarrow_table)
 295 
--> 296 _register_custom_pandas_handlers(serialization_context)
 297 
 298

~/arrow/python/pyarrow/serialization.py in 
_register_custom_pandas_handlers(context)
 175 custom_deserializer=_load_pickle_from_buffer)
 176 
--> 177 if hasattr(pd.core.arrays, 'interval'):
 178 context.register_type(
 179 pd.core.arrays.interval.IntervalArray,

AttributeError: module 'pandas.core' has no attribute 'arrays'

```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4249) [Plasma] Remove reference to logging.h from plasma/common.h

2019-01-13 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4249:
-

 Summary: [Plasma] Remove reference to logging.h from 
plasma/common.h
 Key: ARROW-4249
 URL: https://issues.apache.org/jira/browse/ARROW-4249
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.11.1
Reporter: Philipp Moritz
Assignee: Philipp Moritz
 Fix For: 0.13.0


It is not needed there and pollutes the namespace for applications that use the 
plasma client it with arrow's DCHECK macros (DCHECK is a name widely used in 
other projects).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4236) [JAVA] Distinct plasma client create exceptions

2019-01-10 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-4236:
-

Assignee: Lin Yuan

> [JAVA] Distinct plasma client create exceptions
> ---
>
> Key: ARROW-4236
> URL: https://issues.apache.org/jira/browse/ARROW-4236
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Yuhong Guo
>Assignee: Lin Yuan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [PR|https://github.com/apache/arrow/pull/3306]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4217) [Plasma] Remove custom object metadata

2019-01-09 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4217:
-

 Summary: [Plasma] Remove custom object metadata
 Key: ARROW-4217
 URL: https://issues.apache.org/jira/browse/ARROW-4217
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.11.1
Reporter: Philipp Moritz
Assignee: Philipp Moritz
 Fix For: 0.13.0


Currently, Plasma supports custom metadata for objects. This doesn't seem to be 
used at the moment, and it will simplify the interface and implementation to 
remove it. Removing the custom metadata will also make eviction to other blob 
stores easier (most other stores don't support custom metadata).

My personal use case was to store arrow schemata in there, but they are now 
stored as part of the object itself.

If nobody else is using this, I'd suggest removing it. If people really want 
metadata, they could always store it as a separate object if desired.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4217) [Plasma] Remove custom object metadata

2019-01-09 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz updated ARROW-4217:
--
Description: 
Currently, Plasma supports custom metadata for objects. This doesn't seem to be 
used at the moment, and removing it will simplify the interface and 
implementation of plasma. Removing the custom metadata will also make eviction 
to other blob stores easier (most other stores don't support custom metadata).

My personal use case was to store arrow schemata in there, but they are now 
stored as part of the object itself.

If nobody else is using this, I'd suggest removing it. If people really want 
metadata, they could always store it as a separate object if desired.

 

  was:
Currently, Plasma supports custom metadata for objects. This doesn't seem to be 
used at the moment, and it will simplify the interface and implementation to 
remove it. Removing the custom metadata will also make eviction to other blob 
stores easier (most other stores don't support custom metadata).

My personal use case was to store arrow schemata in there, but they are now 
stored as part of the object itself.

If nobody else is using this, I'd suggest removing it. If people really want 
metadata, they could always store it as a separate object if desired.

 


> [Plasma] Remove custom object metadata
> --
>
> Key: ARROW-4217
> URL: https://issues.apache.org/jira/browse/ARROW-4217
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
> Fix For: 0.13.0
>
>
> Currently, Plasma supports custom metadata for objects. This doesn't seem to 
> be used at the moment, and removing it will simplify the interface and 
> implementation of plasma. Removing the custom metadata will also make 
> eviction to other blob stores easier (most other stores don't support custom 
> metadata).
> My personal use case was to store arrow schemata in there, but they are now 
> stored as part of the object itself.
> If nobody else is using this, I'd suggest removing it. If people really want 
> metadata, they could always store it as a separate object if desired.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2018-12-17 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723449#comment-16723449
 ] 

Philipp Moritz commented on ARROW-4024:
---

I think it's fine to drop support for 0.27. I just wanted to point it out 
because from

[https://github.com/apache/arrow/blob/9da458437162574f3e0d82e4a51dc6c1589b9f94/python/setup.py#L45]

[https://github.com/apache/arrow/blob/9da458437162574f3e0d82e4a51dc6c1589b9f94/python/setup.py#L578]

it looks like 0.27 is supported.

> [Python] Cython compilation error on cython==0.27.3
> ---
>
> Key: ARROW-4024
> URL: https://issues.apache.org/jira/browse/ARROW-4024
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>
> On the latest master, I'm getting the following error:
> {code:java}
> [ 11%] Compiling Cython CXX source for lib...
> Error compiling Cython file:
> 
> ...
>     out.init(type)
>     return out
> cdef object pyarrow_wrap_metadata(
>     ^
> 
> pyarrow/public-api.pxi:95:5: Function signature does not match previous 
> declaration
> CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
> failed{code}
> With 0.29.0 it is working. This might have been introduced in 
> [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
>  but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4015) [Plasma] remove legacy interfaces for plasma manager

2018-12-14 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-4015.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3167
[https://github.com/apache/arrow/pull/3167]

> [Plasma] remove legacy interfaces for plasma manager
> 
>
> Key: ARROW-4015
> URL: https://issues.apache.org/jira/browse/ARROW-4015
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/arrow/issues/3154]
> In legacy ray, interacting with remote plasma stores is done via plasma 
> manager, which is part of ray, and plasma has a few interfaces to support it 
> - namely Fetch() and Wait().
> Currently the legacy ray code has already been removed, and the new raylet 
> uses object manager to interface with remote machine, and these legacy plasma 
> interfaces are no longer used. I think we could remove these legacy 
> interfaces to cleanup code and avoid confusion.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4025) [Python] TensorFlow/PyTorch arrow ThreadPool workarounds not working in some settings

2018-12-13 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4025:
-

 Summary: [Python] TensorFlow/PyTorch arrow ThreadPool workarounds 
not working in some settings
 Key: ARROW-4025
 URL: https://issues.apache.org/jira/browse/ARROW-4025
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.11.1
Reporter: Philipp Moritz


See the bug report in [https://github.com/ray-project/ray/issues/3520]

I wonder if we can revisit this issue and try to get rid of the workarounds we 
tried to deploy in the past.

See also the discussion in [https://github.com/apache/arrow/pull/2096]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2018-12-13 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4024:
-

 Summary: [Python] Cython compilation error on cython==0.27.3
 Key: ARROW-4024
 URL: https://issues.apache.org/jira/browse/ARROW-4024
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


On the latest master, I'm getting the following error:
{code:java}
[ 11%] Compiling Cython CXX source for lib...



Error compiling Cython file:



...



    out.init(type)

    return out





cdef object pyarrow_wrap_metadata(

    ^





pyarrow/public-api.pxi:95:5: Function signature does not match previous 
declaration

CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
failed{code}
With 0.29.0 it is working. This might have been introduced in 
[https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
 but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import

2018-12-07 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3950.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3117
[https://github.com/apache/arrow/pull/3117]

> [Plasma] Don't force loading the TensorFlow op on import
> 
>
> Key: ARROW-3950
> URL: https://issues.apache.org/jira/browse/ARROW-3950
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In certain situation, users want more control over when the TensorFlow op is 
> loaded, so we should make it optional (even if it exists). This happens in 
> Ray for example, where we need to make sure that if multiple python workers 
> try to compile and import the TensorFlow op in parallel, there is no race 
> condition (e.g. one worker could try to import a half-built version of the 
> op).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3958) [Plasma] Reduce number of IPCs

2018-12-07 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3958:
-

 Summary: [Plasma] Reduce number of IPCs
 Key: ARROW-3958
 URL: https://issues.apache.org/jira/browse/ARROW-3958
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.11.1
Reporter: Philipp Moritz
Assignee: Philipp Moritz
 Fix For: 0.12.0


Currently we ship file descriptors of objects from the store to the client 
every time an object is created or gotten. There is relatively few distinct 
file descriptors, so caching them can get rid of one IPC in the majority of 
cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import

2018-12-06 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3950:
-

 Summary: [Plasma] Don't force loading the TensorFlow op on import
 Key: ARROW-3950
 URL: https://issues.apache.org/jira/browse/ARROW-3950
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz
Assignee: Philipp Moritz


In certain situation, users want more control over when the TensorFlow op is 
loaded, so we should make it optional (even if it exists). This happens in Ray 
for example, where we need to make sure that if multiple python workers try to 
compile and import the TensorFlow op in parallel, there is no race condition 
(e.g. one worker could try to import a half-built version of the op).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off

2018-12-03 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3934:
-

 Summary: [Gandiva] Don't compile precompiled tests if 
ARROW_GANDIVA_BUILD_TESTS=off
 Key: ARROW-3934
 URL: https://issues.apache.org/jira/browse/ARROW-3934
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz
Assignee: Philipp Moritz
 Fix For: 0.12.0


Currently the precompiled tests are compiled in any case, even if 
ARROW_GANDIVA_BUILD_TESTS=off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg

2018-12-03 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3199.
---
Resolution: Fixed

Issue resolved by pull request 2551
[https://github.com/apache/arrow/pull/2551]

> [Plasma] Check for EAGAIN in recvmsg and sendmsg
> 
>
> Key: ARROW-3199
> URL: https://issues.apache.org/jira/browse/ARROW-3199
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It turns out that 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63]
>  and probably also 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49]
>  can block and give an EAGAIN error.
> This was discovered during stress tests by https://github.com/stephanie-wang/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2759) Export notification socket of Plasma

2018-12-03 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-2759.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3008
[https://github.com/apache/arrow/pull/3008]

> Export notification socket of Plasma
> 
>
> Key: ARROW-2759
> URL: https://issues.apache.org/jira/browse/ARROW-2759
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++), Python
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, I am implementing an async interface for Ray. The implementation 
> needs some kind of message polling methods like `get_next_notification`.
>  Unfortunately, I find `get_next_notification` in 
> `[https://github.com/apache/arrow/blob/master/python/pyarrow/_plasma.pyx]` 
> blocking, which is an impediment to implementing async utilities. Also, it's 
> hard to check the status of the socket (it could be closed or break up). So I 
> suggest export the notification socket so that there will be more flexibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3920) Plasma reference counting not properly done in TensorFlow custom operator.

2018-12-01 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3920.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 3061
[https://github.com/apache/arrow/pull/3061]

> Plasma reference counting not properly done in TensorFlow custom operator.
> --
>
> Key: ARROW-3920
> URL: https://issues.apache.org/jira/browse/ARROW-3920
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Robert Nishihara
>Assignee: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We never call {{Release}} in the custom op code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize

2018-11-30 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-3919:
-

Assignee: Philipp Moritz

> [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize
> -
>
> Key: ARROW-3919
> URL: https://issues.apache.org/jira/browse/ARROW-3919
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>
> see https://github.com/modin-project/modin/issues/266



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize

2018-11-30 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3919:
-

 Summary: [Python] Support 64 bit indices for pyarrow.serialize and 
pyarrow.deserialize
 Key: ARROW-3919
 URL: https://issues.apache.org/jira/browse/ARROW-3919
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


see https://github.com/modin-project/modin/issues/266



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg

2018-11-29 Thread Philipp Moritz (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704017#comment-16704017
 ] 

Philipp Moritz commented on ARROW-3199:
---

Sorry, I was travelling last week! Yeah, let me finish this PR.

> [Plasma] Check for EAGAIN in recvmsg and sendmsg
> 
>
> Key: ARROW-3199
> URL: https://issues.apache.org/jira/browse/ARROW-3199
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It turns out that 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63]
>  and probably also 
> [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49]
>  can block and give an EAGAIN error.
> This was discovered during stress tests by https://github.com/stephanie-wang/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3765) [Gandiva] Segfault when the validity bitmap has not been allocated

2018-11-15 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3765.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2967
[https://github.com/apache/arrow/pull/2967]

> [Gandiva] Segfault when the validity bitmap has not been allocated
> --
>
> Key: ARROW-3765
> URL: https://issues.apache.org/jira/browse/ARROW-3765
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is because the `validity buffer` could be `None`:
> {code}
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10)))
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [None, ]
> >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0)
> >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers()
> [,  0x11a2b3228>]{code}
> But Gandiva has not implemented it yet, thus accessing a nullptr:
> {code}
> void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const 
> arrow::ArrayData& array_data, EvalBatch* eval_batch) { 
> int buffer_idx = 0;
> // TODO:  
> // - validity is optional 
> uint8_t* validity_buf = 
> const_cast(array_data.buffers[buffer_idx]->data());
> eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
> ++buffer_idx;
> {code}
>  
> Reproduce code:
> {code:java}
> frame_data = np.random.randint(0, 100, size=(2**22, 10))
> table = pa.Table.from_pandas(df)
> filt = ...  # Create any gandiva filter
> r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # 
> segfault{code}
>  Backtrace:
> {code:java}
> * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x10)
>  * frame #0: 0x0001060184fc 
> libarrow.12.dylib`arrow::Buffer::data(this=0x) const at 
> buffer.h:162
>  frame #1: 0x000106fbed78 
> libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8,
>  desc=0x00010101e138, array_data=0x00010061f8e8, 
> eval_batch=0x000100796848) at annotator.cc:65
>  frame #2: 0x000106fbf4ed 
> libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8,
>  record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94
>  frame #3: 0x0001071449b7 
> libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, 
> record_batch=0x0001007a45b8, output_vector=size=1) at 
> llvm_generator.cc:102
>  frame #4: 0x000107059a4f 
> libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, 
> batch=0x0001007a45b8, 
> out_selection=std::__1::shared_ptr::element_type @ 
> 0x0001007a43e8 strong=2 weak=1) at filter.cc:106
>  frame #5: 0x00010948e002 
> gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*,
>  _object*, _object*) + 1986
>  frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475
>  frame #7: 0x0001001d28ca Python`call_function + 602
>  frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #9: 0x0001001d3cf9 Python`fast_function + 569
>  frame #10: 0x0001001d2899 Python`call_function + 553
>  frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616
>  frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902
>  frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48
>  frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174
>  frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277
>  frame #16: 0x00010021ef46 Python`Py_Main + 3558
>  frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248
>  frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3751) [Python] Add more cython bindings for gandiva

2018-11-14 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3751.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2936
[https://github.com/apache/arrow/pull/2936]

> [Python] Add more cython bindings for gandiva
> -
>
> Key: ARROW-3751
> URL: https://issues.apache.org/jira/browse/ARROW-3751
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Gandiva, Python
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are some cython bindings lost in ARROW-3602 (MakeAdd, MakeOr, MakeIn). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2018-11-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-3746:
-

Assignee: Philipp Moritz

> [Gandiva] [Python] Make it possible to list all functions registered with 
> Gandiva
> -
>
> Key: ARROW-3746
> URL: https://issues.apache.org/jira/browse/ARROW-3746
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will also be useful for documentation purposes (right now it is not very 
> easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2018-11-11 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3746.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2933
[https://github.com/apache/arrow/pull/2933]

> [Gandiva] [Python] Make it possible to list all functions registered with 
> Gandiva
> -
>
> Key: ARROW-3746
> URL: https://issues.apache.org/jira/browse/ARROW-3746
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This will also be useful for documentation purposes (right now it is not very 
> easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3742) Fix pyarrow.types & gandiva cython bindings

2018-11-09 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3742.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2931
[https://github.com/apache/arrow/pull/2931]

> Fix pyarrow.types & gandiva cython bindings
> ---
>
> Key: ARROW-3742
> URL: https://issues.apache.org/jira/browse/ARROW-3742
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Gandiva, Python
>Reporter: Siyuan Zhuang
>Assignee: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> 1. 'types.py' didn't export `_as_type`, causing failures in certain 
> cython/python combinations. I am surprised to see that the CI didn't fail.
> 2. After updating the gandiva cpp part (ARROW-3587), the cython bindings 
> (ARROW-3602) are not consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva

2018-11-09 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3746:
-

 Summary: [Gandiva] [Python] Make it possible to list all functions 
registered with Gandiva
 Key: ARROW-3746
 URL: https://issues.apache.org/jira/browse/ARROW-3746
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


This will also be useful for documentation purposes (right now it is not very 
easy to get a list of all the functions that are registered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3602) [Gandiva] [Python] Add preliminary Cython bindings for Gandiva

2018-11-08 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-3602:
-

Assignee: Philipp Moritz

> [Gandiva] [Python] Add preliminary Cython bindings for Gandiva
> --
>
> Key: ARROW-3602
> URL: https://issues.apache.org/jira/browse/ARROW-3602
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Adding a first version of Cython bindings to Gandiva so it can be called from 
> Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3587) [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)

2018-11-08 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3587.
---
   Resolution: Fixed
Fix Version/s: 0.12.0

Issue resolved by pull request 2832
[https://github.com/apache/arrow/pull/2832]

> [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)
> --
>
> Key: ARROW-3587
> URL: https://issues.apache.org/jira/browse/ARROW-3587
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Plasma (C++), Python
>Reporter: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, Arrow seems to have poor serialization support for its own objects.
> For example,
>   
> {code}
> import pyarrow 
> arr = pyarrow.array([1, 2, 3, 4]) 
> pyarrow.serialize(arr)
> {code}
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "pyarrow/serialization.pxi", line 337, in pyarrow.lib.serialize
>  File "pyarrow/serialization.pxi", line 136, in 
> pyarrow.lib.SerializationContext._serialize_callback
>  pyarrow.lib.SerializationCallbackError: pyarrow does not know how to 
> serialize objects of type .
> I am working Ray & modin project, using plasma to store Arrow objects. Lack 
> of direct serialization support harms the performance, so I would like to 
> push a PR to fix this problem.
> I wonder if it is welcome or is there someone else doing it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3721) [Gandiva] [Python] Support all Gandiva literals

2018-11-08 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-3721:
-

 Summary: [Gandiva] [Python] Support all Gandiva literals
 Key: ARROW-3721
 URL: https://issues.apache.org/jira/browse/ARROW-3721
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Support all the literals from 
[https://github.com/apache/arrow/blob/5b116ab175292fe70ed3c8727bcc6868b9695f4a/cpp/src/gandiva/tree_expr_builder.h#L35]
 in the Cython bindings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3718) [Gandiva] Remove spurious gtest include

2018-11-08 Thread Philipp Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz resolved ARROW-3718.
---
Resolution: Fixed

Issue resolved by pull request 2917
[https://github.com/apache/arrow/pull/2917]

> [Gandiva] Remove spurious gtest include
> ---
>
> Key: ARROW-3718
> URL: https://issues.apache.org/jira/browse/ARROW-3718
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Gandiva
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment, cpp/src/gandiva/expr_decomposer.h includes a gtest header 
> which can prevent gandiva to be built without the gtest dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >