[jira] [Created] (ARROW-17719) [Python] Improve error message when all values in a column are null in a parquet partition
Philipp Moritz created ARROW-17719: -- Summary: [Python] Improve error message when all values in a column are null in a parquet partition Key: ARROW-17719 URL: https://issues.apache.org/jira/browse/ARROW-17719 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 9.0.0 Reporter: Philipp Moritz Fix For: 10.0.0 There is a good bug report about this in [https://stackoverflow.com/a/70568419/10891801] and it still seems to be a problem. Basically the error message is pretty bad if all values in a given column of a parquet partition are null. We should either handle this case better or give a better error message. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17079) Improve error message propagation from AWS SDK
Philipp Moritz created ARROW-17079: -- Summary: Improve error message propagation from AWS SDK Key: ARROW-17079 URL: https://issues.apache.org/jira/browse/ARROW-17079 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 8.0.0 Reporter: Philipp Moritz Dear all, I'd like to see if there is interest to improve the error messages that originate from the AWS SDK. Especially for loading datasets from S3, there are many things that can go wrong and the error messages that (Py)Arrow gives are not always the most actionable, especially if the call involves many different SDK functions. In particular, it would be great to have the following attached to each error message: * A machine parseable status code from the AWS SDK * Information as to exactly which AWS SDK call failed, so it can be disambiguated for Arrow API calls that use multiple AWS SDK calls In the ideal case, as a developer I could reconstruct the AWS SDK call that failed from the error message (e.g. in a form the allows me to run the API call via the "aws" CLI program) so I can debug errors and see how they relate to my AWS infrastructure. Any progress in this direction would be super helpful. For context: I recently was debugging some permissioning issues in S3 based on the current error codes and it was pretty hard to figure out what was going on (see [https://github.com/ray-project/ray/issues/19799#issuecomment-1185035602).] I'm happy to take a stab at this problem but might need some help. Is implementing a custom StatusDetail class for AWS errors and propagating errors that way the right hunch here? [https://github.com/apache/arrow/blob/50f6fcad6cc09c06e78dcd09ad07218b86e689de/cpp/src/arrow/status.h#L110] All the best, Philipp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-7991) [C++][Plasma] Allow option for evicting if full when creating an object
[ https://issues.apache.org/jira/browse/ARROW-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-7991. --- Fix Version/s: (was: 0.16.1) 1.0.0 Resolution: Fixed Issue resolved by pull request 6520 [https://github.com/apache/arrow/pull/6520] > [C++][Plasma] Allow option for evicting if full when creating an object > --- > > Key: ARROW-7991 > URL: https://issues.apache.org/jira/browse/ARROW-7991 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.16.0 >Reporter: Stephanie Wang >Assignee: Stephanie Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Plasma currently attempts to evict objects whenever the client tries to > create an object and there is not enough space. Sometimes, though, it is > preferable to allow the client to try something else, such as skipping > creation or freeing other objects. This enhancement would allow the client to > pass in a flag during object creation specifying whether objects should be > evicted or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7998) [C++][Plasma] Make Seal requests synchronous
[ https://issues.apache.org/jira/browse/ARROW-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-7998. --- Fix Version/s: (was: 0.16.1) 1.0.0 Resolution: Fixed Issue resolved by pull request 6529 [https://github.com/apache/arrow/pull/6529] > [C++][Plasma] Make Seal requests synchronous > > > Key: ARROW-7998 > URL: https://issues.apache.org/jira/browse/ARROW-7998 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.16.0 >Reporter: Stephanie Wang >Assignee: Stephanie Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When handling a `Seal` request to create an object and make it visible to > other clients, the plasma store does not wait until the seal is complete > before responding to the requesting client. This makes the interface hard to > use, since the client is not guaranteed that the object is visible yet and > would have to use an additional IPC round-trip to determine when the object > is ready. > > This improvement would require the plasma store to wait until the object has > been created before responding to the client. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7522) [C++][Plasma] Broken Record Batch returned from a function call
[ https://issues.apache.org/jira/browse/ARROW-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013588#comment-17013588 ] Philipp Moritz commented on ARROW-7522: --- Disconnecting from the PlasmaClient and letting it go out of scope is fine, the memory mapped files will still be kept alive. The problem in this code example is that the buffer of the object is not kept alive (the buffer in the `auto buffer = object_buffer.data;` line). If that buffer is kept alive, this shared pointer here [https://github.com/apache/arrow/blob/b218a7fdae0792e185579d8cd20748ed0752b9ff/cpp/src/plasma/client.cc#L137] will make sure the PlasmaClient is kept alive, which will make sure the memory maps are kept alive. To fix this, we would need some way to set a "base" object of an `arrow::RecordBatch` (similar to numpy base objects) which will make sure the backing buffer is kept alive. As a workaround you can also keep the PlasmaClient alive, but that feel very brittle. > [C++][Plasma] Broken Record Batch returned from a function call > --- > > Key: ARROW-7522 > URL: https://issues.apache.org/jira/browse/ARROW-7522 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Plasma >Affects Versions: 0.15.1 > Environment: macOS >Reporter: Chengxin Ma >Priority: Minor > > Scenario: retrieving Record Batch from Plasma with known Object ID. > The following code snippet works well: > {code:java} > int main(int argc, char **argv) > { > plasma::ObjectID object_id = > plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); > // Start up and connect a Plasma client. > plasma::PlasmaClient client; > ARROW_CHECK_OK(client.Connect("/tmp/store")); > plasma::ObjectBuffer object_buffer; > ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer)); > // Retrieve object data. > auto buffer = object_buffer.data; > arrow::io::BufferReader buffer_reader(buffer); > std::shared_ptr record_batch_stream_reader; > ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, > _batch_stream_reader)); > std::shared_ptr record_batch; > arrow::Status status = > record_batch_stream_reader->ReadNext(_batch); > std::cout << "record_batch->column_name(0): " << > record_batch->column_name(0) << std::endl; > std::cout << "record_batch->num_columns(): " << > record_batch->num_columns() << std::endl; > std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << > std::endl; > std::cout << "record_batch->column(0)->length(): " > << record_batch->column(0)->length() << std::endl; > std::cout << "record_batch->column(0)->ToString(): " > << record_batch->column(0)->ToString() << std::endl; > } > {code} > {{record_batch->column(0)->ToString()}} would incur a segmentation fault if > retrieving Record Batch is wrapped in a function: > {code:java} > std::shared_ptr GetRecordBatchFromPlasma(plasma::ObjectID > object_id) > { > // Start up and connect a Plasma client. > plasma::PlasmaClient client; > ARROW_CHECK_OK(client.Connect("/tmp/store")); > plasma::ObjectBuffer object_buffer; > ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer)); > // Retrieve object data. > auto buffer = object_buffer.data; > arrow::io::BufferReader buffer_reader(buffer); > std::shared_ptr record_batch_stream_reader; > ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, > _batch_stream_reader)); > std::shared_ptr record_batch; > arrow::Status status = > record_batch_stream_reader->ReadNext(_batch); > // Disconnect the client. > ARROW_CHECK_OK(client.Disconnect()); > return record_batch; > } > int main(int argc, char **argv) > { > plasma::ObjectID object_id = > plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); > std::shared_ptr record_batch = > GetRecordBatchFromPlasma(object_id); > std::cout << "record_batch->column_name(0): " << > record_batch->column_name(0) << std::endl; > std::cout << "record_batch->num_columns(): " << > record_batch->num_columns() << std::endl; > std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << > std::endl; > std::cout << "record_batch->column(0)->length(): " > << record_batch->column(0)->length() << std::endl; > std::cout << "record_batch->column(0)->ToString(): " > << record_batch->column(0)->ToString() << std::endl; > } > {code} > The meta info of the Record Batch such as number of columns and rows is still > available, but I can't see the content of the columns. > {{lldb}} says that the stop reason is {{EXC_BAD_ACCESS}}, so I think the > Record Batch is destroyed after {{GetRecordBatchFromPlasma}} finishes. But > why can I still see the meta info of this
[jira] [Resolved] (ARROW-7004) [Plasma] Make it possible to bump up object in LRU cache
[ https://issues.apache.org/jira/browse/ARROW-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-7004. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5741 [https://github.com/apache/arrow/pull/5741] > [Plasma] Make it possible to bump up object in LRU cache > > > Key: ARROW-7004 > URL: https://issues.apache.org/jira/browse/ARROW-7004 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > To avoid evicting objects too early, we sometimes want to bump up a number of > objects up in the LRU cache. While it would be possible to call Get() on > these objects, this can be undesirable, since it is blocking on the objects > if they are not available and will make it necessary to call Release() on > them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7004) [Plasma] Make it possible to bump up object in LRU cache
Philipp Moritz created ARROW-7004: - Summary: [Plasma] Make it possible to bump up object in LRU cache Key: ARROW-7004 URL: https://issues.apache.org/jira/browse/ARROW-7004 Project: Apache Arrow Issue Type: Improvement Components: C++ - Plasma Reporter: Philipp Moritz Assignee: Philipp Moritz To avoid evicting objects too early, we sometimes want to bump up a number of objects up in the LRU cache. While it would be possible to call Get() on these objects, this can be undesirable, since it is blocking on the objects if they are not available and will make it necessary to call Release() on them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6907) [C++][Plasma] Allow Plasma store to batch notifications to clients
[ https://issues.apache.org/jira/browse/ARROW-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-6907. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5626 [https://github.com/apache/arrow/pull/5626] > [C++][Plasma] Allow Plasma store to batch notifications to clients > -- > > Key: ARROW-6907 > URL: https://issues.apache.org/jira/browse/ARROW-6907 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Danyang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6824) [Plasma] Support batched create and seal requests for small objects
[ https://issues.apache.org/jira/browse/ARROW-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-6824. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5596 [https://github.com/apache/arrow/pull/5596] > [Plasma] Support batched create and seal requests for small objects > --- > > Key: ARROW-6824 > URL: https://issues.apache.org/jira/browse/ARROW-6824 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.15.0 >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the plasma create API supports creating and sealing a single object > – this makes sense for large objects because their creating throughput is > limited by the memory throughput of the client when the data is filled into > the buffer. However sometimes we want to create lots of small objects in > which case the throughput is limited by the number of IPCs to the store we > can do when creating new objects. This can be fixed by offering a version of > CreateAndSeal that allows us to create multiple objects at the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6824) [Plasma] Support batched create and seal requests for small objects
Philipp Moritz created ARROW-6824: - Summary: [Plasma] Support batched create and seal requests for small objects Key: ARROW-6824 URL: https://issues.apache.org/jira/browse/ARROW-6824 Project: Apache Arrow Issue Type: Improvement Components: C++ - Plasma Affects Versions: 0.15.0 Reporter: Philipp Moritz Currently the plasma create API supports creating and sealing a single object – this makes sense for large objects because their creating throughput is limited by the memory throughput of the client when the data is filled into the buffer. However sometimes we want to create lots of small objects in which case the throughput is limited by the number of IPCs to the store we can do when creating new objects. This can be fixed by offering a version of CreateAndSeal that allows us to create multiple objects at the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation
[ https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-5955: - Assignee: Eric Liang > [Plasma] Support setting memory quotas per plasma client for better isolation > - > > Key: ARROW-5955 > URL: https://issues.apache.org/jira/browse/ARROW-5955 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.14.1 >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, plasma evicts objects according a global LRU queue. In Ray, this > often causes memory-intensive workloads to fail unpredictably, since a client > that creates objects at a high rate can evict objects created by clients at > lower rates. This is despite the fact that the true working set of both > clients may be quite small. > cc [~pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation
[ https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-5955: -- Component/s: C++ - Plasma > [Plasma] Support setting memory quotas per plasma client for better isolation > - > > Key: ARROW-5955 > URL: https://issues.apache.org/jira/browse/ARROW-5955 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Eric Liang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, plasma evicts objects according a global LRU queue. In Ray, this > often causes memory-intensive workloads to fail unpredictably, since a client > that creates objects at a high rate can evict objects created by clients at > lower rates. This is despite the fact that the true working set of both > clients may be quite small. > cc [~pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation
[ https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-5955: -- Affects Version/s: 0.14.1 > [Plasma] Support setting memory quotas per plasma client for better isolation > - > > Key: ARROW-5955 > URL: https://issues.apache.org/jira/browse/ARROW-5955 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.14.1 >Reporter: Eric Liang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, plasma evicts objects according a global LRU queue. In Ray, this > often causes memory-intensive workloads to fail unpredictably, since a client > that creates objects at a high rate can evict objects created by clients at > lower rates. This is despite the fact that the true working set of both > clients may be quite small. > cc [~pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (ARROW-5955) [Plasma] Support setting memory quotas per plasma client for better isolation
[ https://issues.apache.org/jira/browse/ARROW-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-5955. --- Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 4885 [https://github.com/apache/arrow/pull/4885] > [Plasma] Support setting memory quotas per plasma client for better isolation > - > > Key: ARROW-5955 > URL: https://issues.apache.org/jira/browse/ARROW-5955 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Eric Liang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently, plasma evicts objects according a global LRU queue. In Ray, this > often causes memory-intensive workloads to fail unpredictably, since a client > that creates objects at a high rate can evict objects created by clients at > lower rates. This is despite the fact that the true working set of both > clients may be quite small. > cc [~pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-5560) Cannot create Plasma object after OutOfMemory error
[ https://issues.apache.org/jira/browse/ARROW-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-5560: - Assignee: Richard Liaw > Cannot create Plasma object after OutOfMemory error > --- > > Key: ARROW-5560 > URL: https://issues.apache.org/jira/browse/ARROW-5560 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Affects Versions: 0.13.0 >Reporter: Stephanie Wang >Assignee: Richard Liaw >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Time Spent: 2h > Remaining Estimate: 0h > > If the client tries to call `CreateObject` and there is not enough memory > left in the object store to create it, an `OutOfMemory` error will be > returned. However, the plasma store also creates an entry for the object, > even though it failed to be created. This means that later on, if the client > tries to create the object again, it will receive an error that the object > already exists. Also, if the client tries to get the object, it will hang > because the entry appears to be unsealed. > We should fix this by only creating the object entry if the `CreateObject` > operation succeeds. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (ARROW-5560) Cannot create Plasma object after OutOfMemory error
[ https://issues.apache.org/jira/browse/ARROW-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-5560. --- Resolution: Fixed Fix Version/s: (was: 0.13.0) 0.14.1 Issue resolved by pull request 4850 [https://github.com/apache/arrow/pull/4850] > Cannot create Plasma object after OutOfMemory error > --- > > Key: ARROW-5560 > URL: https://issues.apache.org/jira/browse/ARROW-5560 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Affects Versions: 0.13.0 >Reporter: Stephanie Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Time Spent: 2h > Remaining Estimate: 0h > > If the client tries to call `CreateObject` and there is not enough memory > left in the object store to create it, an `OutOfMemory` error will be > returned. However, the plasma store also creates an entry for the object, > even though it failed to be created. This means that later on, if the client > tries to create the object again, it will receive an error that the object > already exists. Also, if the client tries to get the object, it will hang > because the entry appears to be unsealed. > We should fix this by only creating the object entry if the `CreateObject` > operation succeeds. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-5904: -- Component/s: C++ - Plasma > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-5904: - Assignee: Philipp Moritz > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-5904. --- Resolution: Fixed Fix Version/s: 0.14.1 Issue resolved by pull request 4849 [https://github.com/apache/arrow/pull/4849] > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > Time Spent: 20m > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883254#comment-16883254 ] Philipp Moritz commented on ARROW-5904: --- Sounds good, I will prepare a separate PR for this! > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882562#comment-16882562 ] Philipp Moritz commented on ARROW-5904: --- We do need a working build configuration that builds both the C++ and Java files in order to test this. In Ray we do this with a Bazel based build, which I'm happy to upstream and provide Docker files for. Would that help? > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
[ https://issues.apache.org/jira/browse/ARROW-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882490#comment-16882490 ] Philipp Moritz commented on ARROW-5904: --- Looks like this is not currently tested because of https://issues.apache.org/jira/browse/ARROW-4764. > [Java] [Plasma] Fix compilation of Plasma Java client > - > > Key: ARROW-5904 > URL: https://issues.apache.org/jira/browse/ARROW-5904 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is broken since the introduction of user-defined Status messages: > {code:java} > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: > In function '_jobject* > Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, > jbyteArray, jint, jbyteArray)': > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: > error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' >if (s.IsPlasmaObjectExists()) { > ^ > external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: > error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' >if (s.IsPlasmaStoreFull()) { > ^{code} > [~guoyuhong85] Can you add this codepath to the test so we can catch this > kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5904) [Java] [Plasma] Fix compilation of Plasma Java client
Philipp Moritz created ARROW-5904: - Summary: [Java] [Plasma] Fix compilation of Plasma Java client Key: ARROW-5904 URL: https://issues.apache.org/jira/browse/ARROW-5904 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz This is broken since the introduction of user-defined Status messages: {code:java} external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc: In function '_jobject* Java_org_apache_arrow_plasma_PlasmaClientJNI_create(JNIEnv*, jclass, jlong, jbyteArray, jint, jbyteArray)': external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:114:9: error: 'class arrow::Status' has no member named 'IsPlasmaObjectExists' if (s.IsPlasmaObjectExists()) { ^ external/plasma/cpp/src/plasma/lib/java/org_apache_arrow_plasma_PlasmaClientJNI.cc:120:9: error: 'class arrow::Status' has no member named 'IsPlasmaStoreFull' if (s.IsPlasmaStoreFull()) { ^{code} [~guoyuhong85] Can you add this codepath to the test so we can catch this kind of breakage more quickly in the future? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5751) [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not found
[ https://issues.apache.org/jira/browse/ARROW-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-5751: -- Priority: Blocker (was: Major) > [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not > found > - > > Key: ARROW-5751 > URL: https://issues.apache.org/jira/browse/ARROW-5751 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Blocker > > I'm afraid while [https://github.com/apache/arrow/pull/4685] fixed the macOS > wheels for python 3, but the python 2.7 wheel is still broken (with a > different error): > {code:java} > ImportError: > dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: /usr/local/opt/c-ares/lib/libcares.2.dylib > Referenced from: > /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib > Reason: image not found{code} > I tried the same hack as in [https://github.com/apache/arrow/pull/4685] for > libcares but it doesn't work (removing the .dylib fails one of the earlier > build steps). I think the only way to go forward on this is to compile grpc > ourselves. My attempt to do this in > [https://github.com/apache/arrow/compare/master...pcmoritz:mac-wheels-py2] > fails because OpenSSL is not found even though I'm specifying the > OPENSSL_ROOT_DIR (see > [https://travis-ci.org/pcmoritz/crossbow/builds/550603543]). Let me know if > you have any ideas how to fix this! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5751) [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not found
Philipp Moritz created ARROW-5751: - Summary: [Packaging][Python] Python 2.7 wheels broken on macOS: libcares.2.dylib not found Key: ARROW-5751 URL: https://issues.apache.org/jira/browse/ARROW-5751 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz I'm afraid while [https://github.com/apache/arrow/pull/4685] fixed the macOS wheels for python 3, but the python 2.7 wheel is still broken (with a different error): {code:java} ImportError: dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/c-ares/lib/libcares.2.dylib Referenced from: /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib Reason: image not found{code} I tried the same hack as in [https://github.com/apache/arrow/pull/4685] for libcares but it doesn't work (removing the .dylib fails one of the earlier build steps). I think the only way to go forward on this is to compile grpc ourselves. My attempt to do this in [https://github.com/apache/arrow/compare/master...pcmoritz:mac-wheels-py2] fails because OpenSSL is not found even though I'm specifying the OPENSSL_ROOT_DIR (see [https://travis-ci.org/pcmoritz/crossbow/builds/550603543]). Let me know if you have any ideas how to fix this! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5690) [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing
[ https://issues.apache.org/jira/browse/ARROW-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870781#comment-16870781 ] Philipp Moritz commented on ARROW-5690: --- Linking protobuf statically leads to the following error: {code:java} ImportError: dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/grpc/lib/libgrpc++.dylib Referenced from: /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow_python.14.dylib Reason: image not found{code} So we might need to bundle GRPC (but I'm not sure about that). Do we have any configurations in the build system where we do that at the moment? > [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing > - > > Key: ARROW-5690 > URL: https://issues.apache.org/jira/browse/ARROW-5690 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Blocker > Fix For: 0.14.0 > > > If I build macOS arrow wheels with crossbow from the latest master > (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing > pyarrow gives the following error message: > {code:java} > In [1]: import pyarrow > > > --- > ImportError Traceback (most recent call last) > in > > 1 import pyarrow > ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, int64, > ImportError: > dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib > Referenced from: > /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib > Reason: image not found{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5690) [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing
[ https://issues.apache.org/jira/browse/ARROW-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-5690: - Assignee: Philipp Moritz > [Packaging][Python] macOS wheels broken: libprotobuf.18.dylib missing > - > > Key: ARROW-5690 > URL: https://issues.apache.org/jira/browse/ARROW-5690 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Blocker > Fix For: 0.14.0 > > > If I build macOS arrow wheels with crossbow from the latest master > (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing > pyarrow gives the following error message: > {code:java} > In [1]: import pyarrow > > > --- > ImportError Traceback (most recent call last) > in > > 1 import pyarrow > ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in > 47 import pyarrow.compat as compat > 48 > ---> 49 from pyarrow.lib import cpu_count, set_cpu_count > 50 from pyarrow.lib import (null, bool_, > 51 int8, int16, int32, int64, > ImportError: > dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, > 2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib > Referenced from: > /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib > Reason: image not found{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5690) [Packaging] macOS wheels broken: libprotobuf.18.dylib missing
Philipp Moritz created ARROW-5690: - Summary: [Packaging] macOS wheels broken: libprotobuf.18.dylib missing Key: ARROW-5690 URL: https://issues.apache.org/jira/browse/ARROW-5690 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz If I build macOS arrow wheels with crossbow from the latest master (a77257f4790c562dcb74724fc4a22c157ab36018) and install them, importing pyarrow gives the following error message: {code:java} In [1]: import pyarrow --- ImportError Traceback (most recent call last) in > 1 import pyarrow ~/anaconda3/lib/python3.6/site-packages/pyarrow/__init__.py in 47 import pyarrow.compat as compat 48 ---> 49 from pyarrow.lib import cpu_count, set_cpu_count 50 from pyarrow.lib import (null, bool_, 51 int8, int16, int32, int64, ImportError: dlopen(/Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/protobuf/lib/libprotobuf.18.dylib Referenced from: /Users/pcmoritz/anaconda3/lib/python3.6/site-packages/pyarrow/libarrow.14.dylib Reason: image not found{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing
[ https://issues.apache.org/jira/browse/ARROW-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868898#comment-16868898 ] Philipp Moritz commented on ARROW-5670: --- Sounds good! The following patch [https://github.com/apache/arrow/compare/master...pcmoritz:python-urllib] and installing `pip install requests[security]` fixed it for me. > [crossbow] mac os python 3.5 wheel failing > -- > > Key: ARROW-5670 > URL: https://issues.apache.org/jira/browse/ARROW-5670 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > Currently the macOS python 3.5 is failing with > {code:java} > Downloading Apache Thrift from Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", > line 1254, in do_open > h.request(req.get_method(), req.selector, req.data, headers) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1107, in request > self._send_request(method, url, body, headers) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1152, in _send_request > self.endheaders(body) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1103, in endheaders > self._send_output(message_body) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 934, in _send_output > self.send(msg) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 877, in send > self.connect() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1261, in connect > server_hostname=server_hostname) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 385, in wrap_socket > _context=self) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 760, in __init__ > self.do_handshake() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 996, in do_handshake > self._sslobj.do_handshake() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 641, in do_handshake > self._sslobj.do_handshake() > ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol > version (_ssl.c:719){code} > I've been looking into this error and will try to push a fix (the openssl > version that is used with python 3.5 on macos is too old I think). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing
[ https://issues.apache.org/jira/browse/ARROW-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868894#comment-16868894 ] Philipp Moritz commented on ARROW-5670: --- Ah ok great! If you haven't looked into this particular issue I'm happy to take it, it can be fixed by replacing urllib with the requests library in one of the helper scripts. > [crossbow] mac os python 3.5 wheel failing > -- > > Key: ARROW-5670 > URL: https://issues.apache.org/jira/browse/ARROW-5670 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > Currently the macOS python 3.5 is failing with > {code:java} > Downloading Apache Thrift from Traceback (most recent call last): > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", > line 1254, in do_open > h.request(req.get_method(), req.selector, req.data, headers) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1107, in request > self._send_request(method, url, body, headers) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1152, in _send_request > self.endheaders(body) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1103, in endheaders > self._send_output(message_body) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 934, in _send_output > self.send(msg) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 877, in send > self.connect() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", > line 1261, in connect > server_hostname=server_hostname) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 385, in wrap_socket > _context=self) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 760, in __init__ > self.do_handshake() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 996, in do_handshake > self._sslobj.do_handshake() > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", > line 641, in do_handshake > self._sslobj.do_handshake() > ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol > version (_ssl.c:719){code} > I've been looking into this error and will try to push a fix (the openssl > version that is used with python 3.5 on macos is too old I think). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5671) [crossbow] mac os python wheels failing
Philipp Moritz created ARROW-5671: - Summary: [crossbow] mac os python wheels failing Key: ARROW-5671 URL: https://issues.apache.org/jira/browse/ARROW-5671 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz The building of (all?) macOS python wheels is currently failing with {code:java} Traceback (most recent call last): File "", line 3, in File "/Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/__init__.py", line 49, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: dlopen(/Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libarrow_boost_system.dylib Referenced from: /Users/travis/build/pcmoritz/crossbow/venv/lib/python3.7/site-packages/pyarrow/libarrow.14.dylib Reason: image not found{code} Not sure where this was introduced :( -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5670) [crossbow] mac os python 3.5 wheel failing
Philipp Moritz created ARROW-5670: - Summary: [crossbow] mac os python 3.5 wheel failing Key: ARROW-5670 URL: https://issues.apache.org/jira/browse/ARROW-5670 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Currently the macOS python 3.5 is failing with {code:java} Downloading Apache Thrift from Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1254, in do_open h.request(req.get_method(), req.selector, req.data, headers) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1107, in request self._send_request(method, url, body, headers) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1152, in _send_request self.endheaders(body) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1103, in endheaders self._send_output(message_body) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 934, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 877, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1261, in connect server_hostname=server_hostname) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 385, in wrap_socket _context=self) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 760, in __init__ self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 996, in do_handshake self._sslobj.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 641, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719){code} I've been looking into this error and will try to push a fix (the openssl version that is used with python 3.5 on macos is too old I think). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5669) [crossbow] manylinux1 wheel building failing
Philipp Moritz created ARROW-5669: - Summary: [crossbow] manylinux1 wheel building failing Key: ARROW-5669 URL: https://issues.apache.org/jira/browse/ARROW-5669 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz I tried to set up a crossbow queue (on a0e1fbb9ef51d05a3f28e221cf8c5d4031a50c93), and right now building the manylinux1 wheels seems to be failing because of the arrow flight tests: {code:java} ___ test_tls_do_get def test_tls_do_get(): """Try a simple do_get call over TLS.""" table = simple_ints_table() > certs = example_tls_certs() usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:563: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:64: in example_tls_certs "root_cert": read_flight_resource("root-ca.pem"), usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:48: in read_flight_resource root = resource_root() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ def resource_root(): """Get the path to the test resources directory.""" if not os.environ.get("ARROW_TEST_DATA"): > raise RuntimeError("Test resources not found; set " "ARROW_TEST_DATA to /testing") E RuntimeError: Test resources not found; set ARROW_TEST_DATA to /testing usr/local/lib/python3.6/site-packages/pyarrow/tests/test_flight.py:41: RuntimeError{code} This may have been introduced in [https://github.com/apache/arrow/pull/4594|https://github.com/apache/arrow/pull/4594.] Any thoughts how we should proceed with this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5283) [C++][Plasma] Server crash when creating an aborted object 3 times
[ https://issues.apache.org/jira/browse/ARROW-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-5283. --- Resolution: Fixed Issue resolved by pull request 4272 [https://github.com/apache/arrow/pull/4272] > [C++][Plasma] Server crash when creating an aborted object 3 times > -- > > Key: ARROW-5283 > URL: https://issues.apache.org/jira/browse/ARROW-5283 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: shengjun.li >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > cpp/CMakeLists.txt > option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON) > sequence: > (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0) > (2) call PlasmaClient::Release(id_object) > (3) call PlasmaClient::Abort(id_object) > (4) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0) // where > the id_object is the same as (1) > (5) call PlasmaClient::Release(id_object) > (6) call PlasmaClient::Abort(id_object) > (7) call PlasmaClient::Create(id_object, data_size, 0, 0, , 0) // where > the id_object is the same as (1) > server crash! > F0508 10:03:09.546859 32587 eviction_policy.cc:27] Check failed: it == > item_map_.end() > *** Check failure stack trace: *** > *** Aborted at 1557280989 (unix time) try "date -d @1557280989" if you are > using GNU date *** > PC: @ 0x7f5403a46428 gsignal > *** SIGABRT (@0x3e87f4b) received by PID 32587 (TID 0x7f5406950f80) from > PID 32587; stack trace: *** > @ 0x7f5403dec390 (unknown) > @ 0x7f5403a46428 gsignal > @ 0x7f5403a4802a abort > @ 0x7f5405780f69 google::logging_fail() > @ 0x7f5405782a3d google::LogMessage::Fail() > @ 0x7f5405785054 google::LogMessage::SendToLog() > @ 0x7f540578255b google::LogMessage::Flush() > @ 0x7f5405782779 google::LogMessage::~LogMessage() > @ 0x7f54053f98bd arrow::util::ArrowLog::~ArrowLog() > @ 0x4afcae plasma::LRUCache::Add() > @ 0x4b00f1 plasma::EvictionPolicy::ObjectCreated() > @ 0x4b61e0 plasma::PlasmaStore::CreateObject() > @ 0x4babcc plasma::PlasmaStore::ProcessMessage() > @ 0x4b95c3 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi > @ 0x4bdb80 > _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi > @ 0x4aba58 std::function<>::operator()() > @ 0x4aaf67 plasma::EventLoop::FileEventCallback() > @ 0x4dc1bd aeProcessEvents > @ 0x4dc37e aeMain > @ 0x4ab25b plasma::EventLoop::Start() > @ 0x4c00c1 plasma::PlasmaStoreRunner::Start() > @ 0x4bc77b plasma::StartServer() > @ 0x4bd3eb main > @ 0x7f5403a31830 __libc_start_main > @ 0x49e9f9 _start > @ 0x0 (unknown) > Aborted (core dumped) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5186) [Plasma] Crash on deleting CUDA memory
[ https://issues.apache.org/jira/browse/ARROW-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-5186. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4177 [https://github.com/apache/arrow/pull/4177] > [Plasma] Crash on deleting CUDA memory > -- > > Key: ARROW-5186 > URL: https://issues.apache.org/jira/browse/ARROW-5186 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: shengjun.li >Assignee: shengjun.li >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > cpp/CMakeLists.txt > option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" > ON) > option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON) > [sample sequence] > (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 1) // where > device_num != 0 > (2) call PlasmaClient::Seal(id_object) > (3) call PlasmaClient::Release(id_object) > (4) call PlasmaClient::Delete(id_object) // server carsh! > *** Aborted at 1555645923 (unix time) try "date -d @1555645923" if you are > using GNU date *** > PC: @ 0x7f65bcfa1428 gsignal > *** SIGABRT (@0x3e86d67) received by PID 28007 (TID 0x7f65bf225740) from > PID 28007; stack trace: *** > @ 0x7f65bd347390 (unknown) > @ 0x7f65bcfa1428 gsignal > @ 0x7f65bcfa302a abort > @ 0x4a56cd dlfree > @ 0x4b4bc2 plasma::PlasmaAllocator::Free() > @ 0x4b7da3 plasma::PlasmaStore::EraseFromObjectTable() > @ 0x4b87d2 plasma::PlasmaStore::DeleteObject() > @ 0x4bb3d2 plasma::PlasmaStore::ProcessMessage() > @ 0x4b9195 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi > @ 0x4bd752 > _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi > @ 0x4ab998 std::function<>::operator()() > @ 0x4aaea7 plasma::EventLoop::FileEventCallback() > @ 0x4dbd8f aeProcessEvents > @ 0x4dbf50 aeMain > @ 0x4ab19b plasma::EventLoop::Start() > @ 0x4bfc93 plasma::PlasmaStoreRunner::Start() > @ 0x4bc34d plasma::StartServer() > @ 0x4bcfbd main > @ 0x7f65bcf8c830 __libc_start_main > @ 0x49e939 _start > @ 0x0 (unknown) > Aborted (core dumped) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5027) [Python] Add JSON Reader
Philipp Moritz created ARROW-5027: - Summary: [Python] Add JSON Reader Key: ARROW-5027 URL: https://issues.apache.org/jira/browse/ARROW-5027 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Philipp Moritz Add bindings for the JSON reader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4939) [Python] Add wrapper for "sum" kernel
[ https://issues.apache.org/jira/browse/ARROW-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4939. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3954 [https://github.com/apache/arrow/pull/3954] > [Python] Add wrapper for "sum" kernel > - > > Key: ARROW-4939 > URL: https://issues.apache.org/jira/browse/ARROW-4939 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Add pyarrow wrappers for the sum compute kernel. > For this we also need to add wrappers for the new arrow::Scalar types and > appropriate conversions from Datum. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5022) [C++] Implement more "Datum" types for AggregateKernel
Philipp Moritz created ARROW-5022: - Summary: [C++] Implement more "Datum" types for AggregateKernel Key: ARROW-5022 URL: https://issues.apache.org/jira/browse/ARROW-5022 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Currently it gives the following error if the datum isn't an array: {code:java} AggregateKernel expects Array datum{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5002) [C++] Implement GroupBy
Philipp Moritz created ARROW-5002: - Summary: [C++] Implement GroupBy Key: ARROW-5002 URL: https://issues.apache.org/jira/browse/ARROW-5002 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Dear all, I wonder what the best way forward is for implementing GroupBy kernels. Initially this was part of https://issues.apache.org/jira/browse/ARROW-4124 but is not contained in the current implementation as far as I can tell. It seems that the part of group by that just returns indices could be conveniently implemented with the HashKernel. That seems useful in any case. Is that indeed the best way forward/should this be done? GroupBy + Aggregate could then either be implemented with that + the Take kernel + aggregation involving more memory copies than necessary though or as part of the aggregate kernel. Probably the latter is preferred, any thoughts on that? Am I missing any other JIRAs related to this? Best, Philipp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4983) [Plasma] Unmap memory when the client is destroyed
Philipp Moritz created ARROW-4983: - Summary: [Plasma] Unmap memory when the client is destroyed Key: ARROW-4983 URL: https://issues.apache.org/jira/browse/ARROW-4983 Project: Apache Arrow Issue Type: Improvement Components: C++ - Plasma Affects Versions: 0.12.1 Reporter: Philipp Moritz Assignee: Philipp Moritz Currently the plasma memory mapped into the client is not unmapped upon destruction of the client, which can cause memory mapped files to be kept around longer than necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4958) [C++] Purely static linking broken
[ https://issues.apache.org/jira/browse/ARROW-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4958: -- Component/s: C++ > [C++] Purely static linking broken > -- > > Key: ARROW-4958 > URL: https://issues.apache.org/jira/browse/ARROW-4958 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Priority: Major > > On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), > running > > {code:java} > docker-compose build cpp > docker-compose run cpp-static-only{code} > yields > {code:java} > [357/382] Linking CXX executable debug/parquet-encoding-benchmark > FAILED: debug/parquet-encoding-benchmark > : && /opt/conda/bin/ccache /usr/bin/g++ -Wno-noexcept-type > -fdiagnostics-color=always -ggdb -O0 -Wall -Wno-conversion > -Wno-sign-conversion -Werror -msse4.2 -g -rdynamic > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o > -o debug/parquet-encoding-benchmark -Wl,-rpath,/opt/conda/lib > /opt/conda/lib/libbenchmark_main.a debug/libparquet.a > /opt/conda/lib/libbenchmark.a debug/libarrow.a > /opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so > /opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so > /opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so > /opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so > /opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a > /opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so > /opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so > jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt > /opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && : > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::AssertionResult::AppendMessage(testing::Message > const&)': > /opt/conda/include/gtest/gtest.h:352: undefined reference to > `testing::Message::GetString[abi:cxx11]() const' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `parquet::BenchmarkDecodeArrow::InitDataInputs()': > /arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to > `arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, > double)' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `parquet::BM_DictDecodingByteArray::DoEncodeData()': > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AlwaysTrue()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AlwaysTrue()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::Message::Message()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, > char const*, int, char const*)' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::operator=(testing::Message const&) const' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::Message::Message()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, > char const*, int, char const*)' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::operator=(testing::Message const&) const' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::internal::scoped_ptr std::char_traits, std::allocator > > >::reset(std::__cxx11::basic_string, > std::allocator >*)': > /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to > `testing::internal::IsTrue(bool)' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::AssertionResult > testing::internal::CmpHelperNE > >*, decltype(nullptr)>(char const*, char const*, > parquet::DictEncoder >* const&, > decltype(nullptr) const&)': > /opt/conda/include/gtest/gtest.h:1573: undefined
[jira] [Updated] (ARROW-4958) [C++] Purely static linking broken
[ https://issues.apache.org/jira/browse/ARROW-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4958: -- Affects Version/s: 0.12.1 > [C++] Purely static linking broken > -- > > Key: ARROW-4958 > URL: https://issues.apache.org/jira/browse/ARROW-4958 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Priority: Major > > On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), > running > > {code:java} > docker-compose build cpp > docker-compose run cpp-static-only{code} > yields > {code:java} > [357/382] Linking CXX executable debug/parquet-encoding-benchmark > FAILED: debug/parquet-encoding-benchmark > : && /opt/conda/bin/ccache /usr/bin/g++ -Wno-noexcept-type > -fdiagnostics-color=always -ggdb -O0 -Wall -Wno-conversion > -Wno-sign-conversion -Werror -msse4.2 -g -rdynamic > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o > -o debug/parquet-encoding-benchmark -Wl,-rpath,/opt/conda/lib > /opt/conda/lib/libbenchmark_main.a debug/libparquet.a > /opt/conda/lib/libbenchmark.a debug/libarrow.a > /opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so > /opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so > /opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so > /opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so > /opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a > /opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so > /opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so > jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt > /opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && : > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::AssertionResult::AppendMessage(testing::Message > const&)': > /opt/conda/include/gtest/gtest.h:352: undefined reference to > `testing::Message::GetString[abi:cxx11]() const' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `parquet::BenchmarkDecodeArrow::InitDataInputs()': > /arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to > `arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, > double)' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `parquet::BM_DictDecodingByteArray::DoEncodeData()': > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AlwaysTrue()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AlwaysTrue()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::Message::Message()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, > char const*, int, char const*)' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::operator=(testing::Message const&) const' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::Message::Message()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, > char const*, int, char const*)' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::operator=(testing::Message const&) const' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to > `testing::internal::AssertHelper::~AssertHelper()' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::internal::scoped_ptr std::char_traits, std::allocator > > >::reset(std::__cxx11::basic_string, > std::allocator >*)': > /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to > `testing::internal::IsTrue(bool)' > src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: > In function `testing::AssertionResult > testing::internal::CmpHelperNE > >*, decltype(nullptr)>(char const*, char const*, > parquet::DictEncoder >* const&, > decltype(nullptr) const&)': > /opt/conda/include/gtest/gtest.h:1573: undefined reference to >
[jira] [Created] (ARROW-4958) [C++] Purely static linking broken
Philipp Moritz created ARROW-4958: - Summary: [C++] Purely static linking broken Key: ARROW-4958 URL: https://issues.apache.org/jira/browse/ARROW-4958 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz On the current master, 816c10d030842a1a0da4d00f95a5e3749c86a74f (#3965), running {code:java} docker-compose build cpp docker-compose run cpp-static-only{code} yields {code:java} [357/382] Linking CXX executable debug/parquet-encoding-benchmark FAILED: debug/parquet-encoding-benchmark : && /opt/conda/bin/ccache /usr/bin/g++ -Wno-noexcept-type -fdiagnostics-color=always -ggdb -O0 -Wall -Wno-conversion -Wno-sign-conversion -Werror -msse4.2 -g -rdynamic src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o -o debug/parquet-encoding-benchmark -Wl,-rpath,/opt/conda/lib /opt/conda/lib/libbenchmark_main.a debug/libparquet.a /opt/conda/lib/libbenchmark.a debug/libarrow.a /opt/conda/lib/libdouble-conversion.a /opt/conda/lib/libbrotlienc.so /opt/conda/lib/libbrotlidec.so /opt/conda/lib/libbrotlicommon.so /opt/conda/lib/libbz2.so /opt/conda/lib/liblz4.so /opt/conda/lib/libsnappy.so.1.1.7 /opt/conda/lib/libz.so /opt/conda/lib/libzstd.so orc_ep-install/lib/liborc.a /opt/conda/lib/libprotobuf.so /opt/conda/lib/libglog.so /opt/conda/lib/libboost_system.so /opt/conda/lib/libboost_filesystem.so jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt /opt/conda/lib/libboost_regex.so /opt/conda/lib/libthrift.so && : src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `testing::AssertionResult::AppendMessage(testing::Message const&)': /opt/conda/include/gtest/gtest.h:352: undefined reference to `testing::Message::GetString[abi:cxx11]() const' src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `parquet::BenchmarkDecodeArrow::InitDataInputs()': /arrow/cpp/src/parquet/encoding-benchmark.cc:201: undefined reference to `arrow::random::RandomArrayGenerator::StringWithRepeats(long, long, int, int, double)' src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `parquet::BM_DictDecodingByteArray::DoEncodeData()': /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AlwaysTrue()' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AlwaysTrue()' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::Message::Message()' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, char const*, int, char const*)' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AssertHelper::operator=(testing::Message const&) const' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AssertHelper::~AssertHelper()' /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to `testing::Message::Message()' /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to `testing::internal::AssertHelper::AssertHelper(testing::TestPartResult::Type, char const*, int, char const*)' /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to `testing::internal::AssertHelper::operator=(testing::Message const&) const' /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to `testing::internal::AssertHelper::~AssertHelper()' /arrow/cpp/src/parquet/encoding-benchmark.cc:317: undefined reference to `testing::internal::AssertHelper::~AssertHelper()' /arrow/cpp/src/parquet/encoding-benchmark.cc:321: undefined reference to `testing::internal::AssertHelper::~AssertHelper()' src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `testing::internal::scoped_ptr, std::allocator > >::reset(std::__cxx11::basic_string, std::allocator >*)': /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to `testing::internal::IsTrue(bool)' src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperNE >*, decltype(nullptr)>(char const*, char const*, parquet::DictEncoder >* const&, decltype(nullptr) const&)': /opt/conda/include/gtest/gtest.h:1573: undefined reference to `testing::AssertionSuccess()' src/parquet/CMakeFiles/parquet-encoding-benchmark.dir/encoding-benchmark.cc.o: In function `testing::internal::scoped_ptr, std::allocator > >::reset(std::__cxx11::basic_stringstream, std::allocator >*)': /opt/conda/include/gtest/internal/gtest-port.h:1215: undefined reference to `testing::internal::IsTrue(bool)'
[jira] [Created] (ARROW-4939) [Python] Add wrapper for "sum" kernel
Philipp Moritz created ARROW-4939: - Summary: [Python] Add wrapper for "sum" kernel Key: ARROW-4939 URL: https://issues.apache.org/jira/browse/ARROW-4939 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.12.1 Reporter: Philipp Moritz Assignee: Philipp Moritz Add pyarrow wrappers for the sum compute kernel. For this we also need to add wrappers for the new arrow::Scalar types and appropriate conversions from Datum. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader
[ https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4912: -- Affects Version/s: 0.12.1 > [C++, Python] Allow specifying column names to CSV reader > - > > Key: ARROW-4912 > URL: https://issues.apache.org/jira/browse/ARROW-4912 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Priority: Major > > Currently I think there is no way to specify custom column names for CSV > files. It's possible to specify the full schema of the file, but not just > column names. > See the related discussion here: ARROW-3722 > The goal of this is to re-use the CSV type-inference but still allow people > to specify custom names for the columns. As far as I know, there is currently > no way to set column names post-hoc, so we should provide a way to specify > them before reading the file. > Related to this, ParseOptions(header_rows=0) is not currently implemented. > Is there any current way to do this or does this need to be implmented? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader
[ https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4912: -- Component/s: Python C++ > [C++, Python] Allow specifying column names to CSV reader > - > > Key: ARROW-4912 > URL: https://issues.apache.org/jira/browse/ARROW-4912 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Priority: Major > > Currently I think there is no way to specify custom column names for CSV > files. It's possible to specify the full schema of the file, but not just > column names. > See the related discussion here: ARROW-3722 > The goal of this is to re-use the CSV type-inference but still allow people > to specify custom names for the columns. As far as I know, there is currently > no way to set column names post-hoc, so we should provide a way to specify > them before reading the file. > Related to this, ParseOptions(header_rows=0) is not currently implemented. > Is there any current way to do this or does this need to be implmented? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader
Philipp Moritz created ARROW-4912: - Summary: [C++, Python] Allow specifying column names to CSV reader Key: ARROW-4912 URL: https://issues.apache.org/jira/browse/ARROW-4912 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Currently I think there is no way to specify custom column names for CSV files. It's possible to specify the full schema of the file, but not just column names. See the related discussion here: ARROW-3722 The goal of this is to re-use the CSV type-inference but still allow people to specify custom names for the columns. As far as I know, there is currently no way to set column names post-hoc, so we should provide a way to specify them before reading the file. Related to this, ParseOptions(header_rows=0) is not currently implemented. Is there any current way to do this or does this need to be implmented? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4810) [Format][C++] Add "LargeList" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789169#comment-16789169 ] Philipp Moritz commented on ARROW-4810: --- I agree that we should have a reference implementation in both C++ and Java. Is there anybody who can help with this? I don't personally have the expertise to do it unfortunately (and would also like to avoid this becoming a zombie project like https://issues.apache.org/jira/browse/ARROW-1692). > [Format][C++] Add "LargeList" type with 64-bit offsets > -- > > Key: ARROW-4810 > URL: https://issues.apache.org/jira/browse/ARROW-4810 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Mentioned in https://github.com/apache/arrow/issues/3845 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4757) [C++] Nested chunked array support
[ https://issues.apache.org/jira/browse/ARROW-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787476#comment-16787476 ] Philipp Moritz commented on ARROW-4757: --- Yeah, that's a good idea, I'll give that a shot. I started making ChunkedArray a subclass of Array and introduce a type so it can be serialized via the IPC mechanism, but it is a little clumsy. > [C++] Nested chunked array support > -- > > Key: ARROW-4757 > URL: https://issues.apache.org/jira/browse/ARROW-4757 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Fix For: 0.14.0 > > > Dear all, > I'm currently trying to lift the 2GB limit on the python serialization. For > this, I implemented a chunked union builder to split the array into smaller > arrays. > However, some of the children of the union array can be ListArrays, which can > themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit > of a loss how to handle this. In principle I'd like to chunk the children > too. However, currently UnionArrays can only have children of type Array, and > there is no way to treat a chunked array (which is a vector of Arrays) as an > Array to store it as a child of a UnionArray. Any ideas how to best support > this use case? > -- Philipp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available
Philipp Moritz created ARROW-4797: - Summary: [Plasma] Avoid store crash if not enough memory is available Key: ARROW-4797 URL: https://issues.apache.org/jira/browse/ARROW-4797 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Currently, the plasma server exists with a fatal check if not enough memory is available. This can lead to errors that are hard to diagnose, see [https://github.com/ray-project/ray/issues/3670] Instead, we should keep the store alive in these circumstances, taking up some of the remaining memory and allow the client to check if enough memory has been allocating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available
[ https://issues.apache.org/jira/browse/ARROW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4797: -- Component/s: C++ - Plasma > [Plasma] Avoid store crash if not enough memory is available > > > Key: ARROW-4797 > URL: https://issues.apache.org/jira/browse/ARROW-4797 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Priority: Major > > Currently, the plasma server exists with a fatal check if not enough memory > is available. This can lead to errors that are hard to diagnose, see > [https://github.com/ray-project/ray/issues/3670] > Instead, we should keep the store alive in these circumstances, taking up > some of the remaining memory and allow the client to check if enough memory > has been allocating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4797) [Plasma] Avoid store crash if not enough memory is available
[ https://issues.apache.org/jira/browse/ARROW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-4797: - Assignee: Philipp Moritz > [Plasma] Avoid store crash if not enough memory is available > > > Key: ARROW-4797 > URL: https://issues.apache.org/jira/browse/ARROW-4797 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > > Currently, the plasma server exists with a fatal check if not enough memory > is available. This can lead to errors that are hard to diagnose, see > [https://github.com/ray-project/ray/issues/3670] > Instead, we should keep the store alive in these circumstances, taking up > some of the remaining memory and allow the client to check if enough memory > has been allocating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4757) Nested chunked array support
[ https://issues.apache.org/jira/browse/ARROW-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783141#comment-16783141 ] Philipp Moritz commented on ARROW-4757: --- One possible way to solve this is to make ChunkedArray a first class citizen, i.e. make it a subclass of Array and allow it to participate in IPC. Then the UnionArray could just have a ChunkedArray as a child to solve the above issue. > Nested chunked array support > > > Key: ARROW-4757 > URL: https://issues.apache.org/jira/browse/ARROW-4757 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > Dear all, > I'm currently trying to lift the 2GB limit on the python serialization. For > this, I implemented a chunked union builder to split the array into smaller > arrays. > However, some of the children of the union array can be ListArrays, which can > themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit > of a loss how to handle this. In principle I'd like to chunk the children > too. However, currently UnionArrays can only have children of type Array, and > there is no way to treat a chunked array (which is a vector of Arrays) as an > Array to store it as a child of a UnionArray. Any ideas how to best support > this use case? > -- Philipp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4757) Nested chunked array support
Philipp Moritz created ARROW-4757: - Summary: Nested chunked array support Key: ARROW-4757 URL: https://issues.apache.org/jira/browse/ARROW-4757 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Dear all, I'm currently trying to lift the 2GB limit on the python serialization. For this, I implemented a chunked union builder to split the array into smaller arrays. However, some of the children of the union array can be ListArrays, which can themselves contain UnionArrays which can contain ListArrays etc. I'm at a bit of a loss how to handle this. In principle I'd like to chunk the children too. However, currently UnionArrays can only have children of type Array, and there is no way to treat a chunked array (which is a vector of Arrays) as an Array to store it as a child of a UnionArray. Any ideas how to best support this use case? -- Philipp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4690) Building TensorFlow compatible wheels for Arrow
Philipp Moritz created ARROW-4690: - Summary: Building TensorFlow compatible wheels for Arrow Key: ARROW-4690 URL: https://issues.apache.org/jira/browse/ARROW-4690 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Since the inclusion of LLVM, arrow wheels stopped working with TensorFlow again (on some configurations at least). While we are continuing to discuss a more permanent solution in [https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion|https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion,], I made some progress in creating tensorflow compatible wheels for an unmodified pyarrow. They won't adhere to the manylinux1 standard, but they should be as compatible as the TensorFlow wheels because they use the same build environment (ubuntu 14.04). I'll create a PR with the necessary changes. I don't propose to ship these wheels but it might be a good idea to include the docker image and instructions how to build them in the tree for organizations that want to use tensorflow with pyarrow on top of pip. The official recommendation should probably be to use conda if the average user wants to do this for now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4491) [Python] Remove usage of std::to_string and std::stoi
[ https://issues.apache.org/jira/browse/ARROW-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761501#comment-16761501 ] Philipp Moritz commented on ARROW-4491: --- Ok, I think I understand this now. On some implementations, int8_t seems to be a typedef to char and the conversion in this case produces a character and not a number. > [Python] Remove usage of std::to_string and std::stoi > - > > Key: ARROW-4491 > URL: https://issues.apache.org/jira/browse/ARROW-4491 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Not sure why this is happening, but for some older compilers I'm seeing > {code:java} > terminate called after throwing an instance of 'std::invalid_argument' > what(): stoi{code} > since > [https://github.com/apache/arrow/pull/3423|https://github.com/apache/arrow/pull/3423.] > Possible cause is that there is no int8_t version of > [https://en.cppreference.com/w/cpp/string/basic_string/to_string|https://en.cppreference.com/w/cpp/string/basic_string/to_string,] > so it might not convert it to a proper string representation of the number. > Any insight on why this could be happening is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1266) [Plasma] Move heap allocations to arrow memory pool
[ https://issues.apache.org/jira/browse/ARROW-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761440#comment-16761440 ] Philipp Moritz commented on ARROW-1266: --- At a quick glance, the only structure that is still allocated with new is GetRequest here: [https://github.com/apache/arrow/blob/0c55b25c84119af59320eab0b0625da9ce987294/cpp/src/plasma/store.cc#L404] The control flow is pretty clear and it is deleted here: [https://github.com/apache/arrow/blob/0c55b25c84119af59320eab0b0625da9ce987294/cpp/src/plasma/store.cc#L296] However if somebody wants to make it a unique_ptr or shared_ptr ([~suquark]?) I wouldn't mind. > [Plasma] Move heap allocations to arrow memory pool > --- > > Key: ARROW-1266 > URL: https://issues.apache.org/jira/browse/ARROW-1266 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philipp Moritz >Priority: Major > Fix For: 0.13.0 > > > At the moment we are allocating memory with std::vectors and even new in some > places, this should be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4491) [Python] Remove usage of std::to_string and std::stoi
Philipp Moritz created ARROW-4491: - Summary: [Python] Remove usage of std::to_string and std::stoi Key: ARROW-4491 URL: https://issues.apache.org/jira/browse/ARROW-4491 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Not sure why this is happening, but for some older compilers I'm seeing {code:java} terminate called after throwing an instance of 'std::invalid_argument' what(): stoi{code} since [https://github.com/apache/arrow/pull/3423|https://github.com/apache/arrow/pull/3423.] Possible cause is that there is no int8_t version of [https://en.cppreference.com/w/cpp/string/basic_string/to_string|https://en.cppreference.com/w/cpp/string/basic_string/to_string,] so it might not convert it to a proper string representation of the number. Any insight on why this could be happening is appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4475) [Python] Serializing objects that contain themselves
Philipp Moritz created ARROW-4475: - Summary: [Python] Serializing objects that contain themselves Key: ARROW-4475 URL: https://issues.apache.org/jira/browse/ARROW-4475 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz This is a regression from [https://github.com/apache/arrow/pull/3423] The following segfaults: {code:java} import pyarrow as pa lst = [] lst.append(lst) pa.serialize(lst){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4455) [Plasma] g++ 8 reports class-memaccess warnings
[ https://issues.apache.org/jira/browse/ARROW-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4455. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3543 [https://github.com/apache/arrow/pull/3543] > [Plasma] g++ 8 reports class-memaccess warnings > --- > > Key: ARROW-4455 > URL: https://issues.apache.org/jira/browse/ARROW-4455 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4452) [Python] Serializing sparse torch tensors
[ https://issues.apache.org/jira/browse/ARROW-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4452: -- Description: Using the pytorch serialization handler on sparse Tensors: {code:java} import torch i = torch.LongTensor([[0, 2], [1, 0], [1, 2]]) v = torch.FloatTensor([3, 4, 5 ]) tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3])) pyarrow.serialization.register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context) s = pyarrow.serialize(tensor, context=pyarrow.serialization._default_serialization_context) {code} Produces this result: {code:java} TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to convert to a dense tensor first.{code} We should provide a way to serialize sparse torch tensors, especially now that we are getting support for sparse Tensors. was: Using the pytorch serialization handler on sparse Tensors: {code:java} import torch i = torch.LongTensor([[0, 2], [1, 0], [1, 2]]) v = torch.FloatTensor([3, 4, 5 ]) tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3])) register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context) s = pyarrow.serialize(tensor, context=pyarrow.serialization._default_serialization_context) {code} Produces this result: {code:java} TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to convert to a dense tensor first.{code} We should provide a way to serialize sparse torch tensors, especially now that we are getting support for sparse Tensors. > [Python] Serializing sparse torch tensors > - > > Key: ARROW-4452 > URL: https://issues.apache.org/jira/browse/ARROW-4452 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > Using the pytorch serialization handler on sparse Tensors: > {code:java} > import torch > i = torch.LongTensor([[0, 2], [1, 0], [1, 2]]) > v = torch.FloatTensor([3, 4, 5 ]) > tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3])) > pyarrow.serialization.register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context) > s = pyarrow.serialize(tensor, > context=pyarrow.serialization._default_serialization_context) {code} > Produces this result: > {code:java} > TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to > convert to a dense tensor first.{code} > We should provide a way to serialize sparse torch tensors, especially now > that we are getting support for sparse Tensors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4452) [Python] Serializing sparse torch tensors
Philipp Moritz created ARROW-4452: - Summary: [Python] Serializing sparse torch tensors Key: ARROW-4452 URL: https://issues.apache.org/jira/browse/ARROW-4452 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Using the pytorch serialization handler on sparse Tensors: {code:java} import torch i = torch.LongTensor([[0, 2], [1, 0], [1, 2]]) v = torch.FloatTensor([3, 4, 5 ]) tensor = torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3])) register_torch_serialization_handlers(pyarrow.serialization._default_serialization_context) s = pyarrow.serialize(tensor, context=pyarrow.serialization._default_serialization_context) {code} Produces this result: {code:java} TypeError: can't convert sparse tensor to numpy. Use Tensor.to_dense() to convert to a dense tensor first.{code} We should provide a way to serialize sparse torch tensors, especially now that we are getting support for sparse Tensors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4422) [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
[ https://issues.apache.org/jira/browse/ARROW-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4422. --- Resolution: Fixed Issue resolved by pull request 3526 [https://github.com/apache/arrow/pull/3526] > [Plasma] Enforce memory limit in plasma, rather than relying on > dlmalloc_set_footprint_limit > > > Key: ARROW-4422 > URL: https://issues.apache.org/jira/browse/ARROW-4422 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++) >Affects Versions: 0.12.0 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Fix For: 0.13.0 > > > Currently, Plasma relies on dlmalloc_set_footprint_limit to limit the memory > utilization for Plasma Store. This is restrictive because: > * It restricts Plasma to dlmalloc, which supports limiting memory footprint, > as opposed to other, potentially more performant malloc implementations > (e.g., jemalloc) > * dlmalloc_set_footprint_limit does not guarantee that the limit set by it > the amount of _usable_ memory. As such, we might trigger evictions much > earlier than hitting this limit, e.g., due to fragmentation or metadata > overheads. > To overcome this, we can impose the memory limit at Plasma by tracking the > number of bytes allocated and freed using malloc and free calls. Whenever the > allocation reaches the set limit, we fail any subsequent allocations (i.e., > return NULL from malloc). This allows Plasma to not be tied to dlmalloc, and > also provides more accurate tracking of memory allocation/capacity. > Caveat: We will need to make sure that the mmaped files are living on a file > system that is a bit larger (depending on malloc implementation) than the > Plasma memory limit to account for the extra memory required due to > fragmentation/metadata overheads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4379) Register pyarrow serializers for collections.Counter and collections.deque.
[ https://issues.apache.org/jira/browse/ARROW-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-4379: - Assignee: Robert Nishihara > Register pyarrow serializers for collections.Counter and collections.deque. > --- > > Key: ARROW-4379 > URL: https://issues.apache.org/jira/browse/ARROW-4379 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Robert Nishihara >Assignee: Robert Nishihara >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4379) Register pyarrow serializers for collections.Counter and collections.deque.
[ https://issues.apache.org/jira/browse/ARROW-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4379. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3489 [https://github.com/apache/arrow/pull/3489] > Register pyarrow serializers for collections.Counter and collections.deque. > --- > > Key: ARROW-4379 > URL: https://issues.apache.org/jira/browse/ARROW-4379 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Robert Nishihara >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4378) [Plasma] Release objects upon Create
Philipp Moritz created ARROW-4378: - Summary: [Plasma] Release objects upon Create Key: ARROW-4378 URL: https://issues.apache.org/jira/browse/ARROW-4378 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Affects Versions: 0.13.0 Reporter: Philipp Moritz Similar to the way that {code:java} Get(const std::vector& object_ids, int64_t timeout_ms, std::vector* out){code} releases the object when the shared_ptr inside of ObjectBuffer goes out of scope, the same should happen for {code} Status Create(const ObjectID& object_id, int64_t data_size, const uint8_t* metadata, int64_t metadata_size, std::shared_ptr* data); {code} At the moment people have to remember calling Release() after they created and sealed the object and that can make the use of the C++ API cumbersome. Thanks to [~anuragkh] for reporting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4236) [JAVA] Distinct plasma client create exceptions
[ https://issues.apache.org/jira/browse/ARROW-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4236. --- Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3306 [https://github.com/apache/arrow/pull/3306] > [JAVA] Distinct plasma client create exceptions > --- > > Key: ARROW-4236 > URL: https://issues.apache.org/jira/browse/ARROW-4236 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Yuhong Guo >Assignee: Lin Yuan >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > [PR|https://github.com/apache/arrow/pull/3306] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4319) plasma/store.h pulls ins flatbuffer dependency
[ https://issues.apache.org/jira/browse/ARROW-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748600#comment-16748600 ] Philipp Moritz commented on ARROW-4319: --- Hey Matthias, It should be possible to remove this dependency by shuffling around/forward declaring a few things. Do you want to submit a PR? Let me know if you run into any issues that require a deeper refactor. -- Philipp. > plasma/store.h pulls ins flatbuffer dependency > -- > > Key: ARROW-4319 > URL: https://issues.apache.org/jira/browse/ARROW-4319 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Matthias Vallentin >Priority: Minor > Labels: build > Original Estimate: 1h > Remaining Estimate: 1h > > For our unit testing, we'd like to use the plasma store programmatically by > including *plasma/store.h*. However, this header pulls in flatbuffers via > *src/plasma/common_generated.h*. Is this a necessary include or would a > forward declaration suffice? > Installing just flatbuffers didn't solve the problem, though. It looks like a > specific version is needed: > {noformat} > In file included from /Users/mavam/code/src/arrow/cpp/src/plasma/store.h:30: > In file included from > /Users/mavam/code/src/arrow/cpp/src/plasma/eviction_policy.h:27: > In file included from /Users/mavam/code/src/arrow/cpp/src/plasma/plasma.h:41: > /Users/mavam/code/src/arrow/cpp/src/plasma/common_generated.h:65:21: error: > no matching member function for call to 'Verify' > verifier.Verify(object_id()) && > ~^~ > /usr/local/include/flatbuffers/flatbuffers.h:1896:29: note: candidate > template ignored: couldn't infer template argument 'T' > template bool Verify(size_t elem) const { > ^ > /usr/local/include/flatbuffers/flatbuffers.h:1905:29: note: candidate > function template not viable: requires 2 arguments, but 1 was provided > template bool Verify(const uint8_t *base, voffset_t elem_off) > ^ > /usr/local/include/flatbuffers/flatbuffers.h:1880:8: note: candidate > function not viable: requires 2 arguments, but 1 was provided > bool Verify(size_t elem, size_t elem_len) const { > ^ > /usr/local/include/flatbuffers/flatbuffers.h:1901:8: note: candidate > function not viable: requires 3 arguments, but 1 was provided > bool Verify(const uint8_t *base, voffset_t elem_off, size_t elem_len) const { > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
[ https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-4295: - Assignee: Anurag Khandelwal > [Plasma] Incorrect log message when evicting objects > > > Key: ARROW-4295 > URL: https://issues.apache.org/jira/browse/ARROW-4295 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Assignee: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When Plasma evicts objects on running out of memory, it prints log messages > of the form: > {quote}There is not enough space to create this object, so evicting x objects > to free up y bytes. The number of bytes in use (before this eviction) is > z.{quote} > However, the reported number of bytes in use (before this eviction) actually > reports the number of bytes *after* the eviction. A straightforward fix is to > simply replace z with (y+z). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4295) [Plasma] Incorrect log message when evicting objects
[ https://issues.apache.org/jira/browse/ARROW-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4295. --- Resolution: Fixed Issue resolved by pull request 3433 [https://github.com/apache/arrow/pull/3433] > [Plasma] Incorrect log message when evicting objects > > > Key: ARROW-4295 > URL: https://issues.apache.org/jira/browse/ARROW-4295 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Anurag Khandelwal >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When Plasma evicts objects on running out of memory, it prints log messages > of the form: > {quote}There is not enough space to create this object, so evicting x objects > to free up y bytes. The number of bytes in use (before this eviction) is > z.{quote} > However, the reported number of bytes in use (before this eviction) actually > reports the number of bytes *after* the eviction. A straightforward fix is to > simply replace z with (y+z). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4285) [Python] Use proper builder interface for serialization
Philipp Moritz created ARROW-4285: - Summary: [Python] Use proper builder interface for serialization Key: ARROW-4285 URL: https://issues.apache.org/jira/browse/ARROW-4285 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.12.0 Reporter: Philipp Moritz As a preparation for ARROW-3919, refactor the python serialization code such that the default builder interface is used. In the next step we can then plug in ChunkedBuilders to make sure that the generated arrays are properly chunked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4269) [Python] AttributeError: module 'pandas.core' has no attribute 'arrays'
Philipp Moritz created ARROW-4269: - Summary: [Python] AttributeError: module 'pandas.core' has no attribute 'arrays' Key: ARROW-4269 URL: https://issues.apache.org/jira/browse/ARROW-4269 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz This happens with pandas 0.22: ``` In [1]: import pyarrow --- AttributeError Traceback (most recent call last) in () > 1 import pyarrow ~/arrow/python/pyarrow/__init__.py in () 174 localfs = LocalFileSystem.get_instance() 175 --> 176 from pyarrow.serialization import (default_serialization_context, 177 register_default_serialization_handlers, 178 register_torch_serialization_handlers) ~/arrow/python/pyarrow/serialization.py in () 303 304 --> 305 register_default_serialization_handlers(_default_serialization_context) ~/arrow/python/pyarrow/serialization.py in register_default_serialization_handlers(serialization_context) 294 custom_deserializer=_deserialize_pyarrow_table) 295 --> 296 _register_custom_pandas_handlers(serialization_context) 297 298 ~/arrow/python/pyarrow/serialization.py in _register_custom_pandas_handlers(context) 175 custom_deserializer=_load_pickle_from_buffer) 176 --> 177 if hasattr(pd.core.arrays, 'interval'): 178 context.register_type( 179 pd.core.arrays.interval.IntervalArray, AttributeError: module 'pandas.core' has no attribute 'arrays' ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4249) [Plasma] Remove reference to logging.h from plasma/common.h
Philipp Moritz created ARROW-4249: - Summary: [Plasma] Remove reference to logging.h from plasma/common.h Key: ARROW-4249 URL: https://issues.apache.org/jira/browse/ARROW-4249 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Affects Versions: 0.11.1 Reporter: Philipp Moritz Assignee: Philipp Moritz Fix For: 0.13.0 It is not needed there and pollutes the namespace for applications that use the plasma client it with arrow's DCHECK macros (DCHECK is a name widely used in other projects). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4236) [JAVA] Distinct plasma client create exceptions
[ https://issues.apache.org/jira/browse/ARROW-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-4236: - Assignee: Lin Yuan > [JAVA] Distinct plasma client create exceptions > --- > > Key: ARROW-4236 > URL: https://issues.apache.org/jira/browse/ARROW-4236 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Yuhong Guo >Assignee: Lin Yuan >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [PR|https://github.com/apache/arrow/pull/3306] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4217) [Plasma] Remove custom object metadata
Philipp Moritz created ARROW-4217: - Summary: [Plasma] Remove custom object metadata Key: ARROW-4217 URL: https://issues.apache.org/jira/browse/ARROW-4217 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Affects Versions: 0.11.1 Reporter: Philipp Moritz Assignee: Philipp Moritz Fix For: 0.13.0 Currently, Plasma supports custom metadata for objects. This doesn't seem to be used at the moment, and it will simplify the interface and implementation to remove it. Removing the custom metadata will also make eviction to other blob stores easier (most other stores don't support custom metadata). My personal use case was to store arrow schemata in there, but they are now stored as part of the object itself. If nobody else is using this, I'd suggest removing it. If people really want metadata, they could always store it as a separate object if desired. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4217) [Plasma] Remove custom object metadata
[ https://issues.apache.org/jira/browse/ARROW-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-4217: -- Description: Currently, Plasma supports custom metadata for objects. This doesn't seem to be used at the moment, and removing it will simplify the interface and implementation of plasma. Removing the custom metadata will also make eviction to other blob stores easier (most other stores don't support custom metadata). My personal use case was to store arrow schemata in there, but they are now stored as part of the object itself. If nobody else is using this, I'd suggest removing it. If people really want metadata, they could always store it as a separate object if desired. was: Currently, Plasma supports custom metadata for objects. This doesn't seem to be used at the moment, and it will simplify the interface and implementation to remove it. Removing the custom metadata will also make eviction to other blob stores easier (most other stores don't support custom metadata). My personal use case was to store arrow schemata in there, but they are now stored as part of the object itself. If nobody else is using this, I'd suggest removing it. If people really want metadata, they could always store it as a separate object if desired. > [Plasma] Remove custom object metadata > -- > > Key: ARROW-4217 > URL: https://issues.apache.org/jira/browse/ARROW-4217 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Affects Versions: 0.11.1 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Fix For: 0.13.0 > > > Currently, Plasma supports custom metadata for objects. This doesn't seem to > be used at the moment, and removing it will simplify the interface and > implementation of plasma. Removing the custom metadata will also make > eviction to other blob stores easier (most other stores don't support custom > metadata). > My personal use case was to store arrow schemata in there, but they are now > stored as part of the object itself. > If nobody else is using this, I'd suggest removing it. If people really want > metadata, they could always store it as a separate object if desired. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3
[ https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723449#comment-16723449 ] Philipp Moritz commented on ARROW-4024: --- I think it's fine to drop support for 0.27. I just wanted to point it out because from [https://github.com/apache/arrow/blob/9da458437162574f3e0d82e4a51dc6c1589b9f94/python/setup.py#L45] [https://github.com/apache/arrow/blob/9da458437162574f3e0d82e4a51dc6c1589b9f94/python/setup.py#L578] it looks like 0.27 is supported. > [Python] Cython compilation error on cython==0.27.3 > --- > > Key: ARROW-4024 > URL: https://issues.apache.org/jira/browse/ARROW-4024 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > On the latest master, I'm getting the following error: > {code:java} > [ 11%] Compiling Cython CXX source for lib... > Error compiling Cython file: > > ... > out.init(type) > return out > cdef object pyarrow_wrap_metadata( > ^ > > pyarrow/public-api.pxi:95:5: Function signature does not match previous > declaration > CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' > failed{code} > With 0.29.0 it is working. This might have been introduced in > [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b] > but I'm not sure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4015) [Plasma] remove legacy interfaces for plasma manager
[ https://issues.apache.org/jira/browse/ARROW-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-4015. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3167 [https://github.com/apache/arrow/pull/3167] > [Plasma] remove legacy interfaces for plasma manager > > > Key: ARROW-4015 > URL: https://issues.apache.org/jira/browse/ARROW-4015 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Zhijun Fu >Assignee: Zhijun Fu >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > [https://github.com/apache/arrow/issues/3154] > In legacy ray, interacting with remote plasma stores is done via plasma > manager, which is part of ray, and plasma has a few interfaces to support it > - namely Fetch() and Wait(). > Currently the legacy ray code has already been removed, and the new raylet > uses object manager to interface with remote machine, and these legacy plasma > interfaces are no longer used. I think we could remove these legacy > interfaces to cleanup code and avoid confusion. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4025) [Python] TensorFlow/PyTorch arrow ThreadPool workarounds not working in some settings
Philipp Moritz created ARROW-4025: - Summary: [Python] TensorFlow/PyTorch arrow ThreadPool workarounds not working in some settings Key: ARROW-4025 URL: https://issues.apache.org/jira/browse/ARROW-4025 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.11.1 Reporter: Philipp Moritz See the bug report in [https://github.com/ray-project/ray/issues/3520] I wonder if we can revisit this issue and try to get rid of the workarounds we tried to deploy in the past. See also the discussion in [https://github.com/apache/arrow/pull/2096] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3
Philipp Moritz created ARROW-4024: - Summary: [Python] Cython compilation error on cython==0.27.3 Key: ARROW-4024 URL: https://issues.apache.org/jira/browse/ARROW-4024 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz On the latest master, I'm getting the following error: {code:java} [ 11%] Compiling Cython CXX source for lib... Error compiling Cython file: ... out.init(type) return out cdef object pyarrow_wrap_metadata( ^ pyarrow/public-api.pxi:95:5: Function signature does not match previous declaration CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' failed{code} With 0.29.0 it is working. This might have been introduced in [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b] but I'm not sure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import
[ https://issues.apache.org/jira/browse/ARROW-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3950. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3117 [https://github.com/apache/arrow/pull/3117] > [Plasma] Don't force loading the TensorFlow op on import > > > Key: ARROW-3950 > URL: https://issues.apache.org/jira/browse/ARROW-3950 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In certain situation, users want more control over when the TensorFlow op is > loaded, so we should make it optional (even if it exists). This happens in > Ray for example, where we need to make sure that if multiple python workers > try to compile and import the TensorFlow op in parallel, there is no race > condition (e.g. one worker could try to import a half-built version of the > op). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3958) [Plasma] Reduce number of IPCs
Philipp Moritz created ARROW-3958: - Summary: [Plasma] Reduce number of IPCs Key: ARROW-3958 URL: https://issues.apache.org/jira/browse/ARROW-3958 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Affects Versions: 0.11.1 Reporter: Philipp Moritz Assignee: Philipp Moritz Fix For: 0.12.0 Currently we ship file descriptors of objects from the store to the client every time an object is created or gotten. There is relatively few distinct file descriptors, so caching them can get rid of one IPC in the majority of cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3950) [Plasma] Don't force loading the TensorFlow op on import
Philipp Moritz created ARROW-3950: - Summary: [Plasma] Don't force loading the TensorFlow op on import Key: ARROW-3950 URL: https://issues.apache.org/jira/browse/ARROW-3950 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Assignee: Philipp Moritz In certain situation, users want more control over when the TensorFlow op is loaded, so we should make it optional (even if it exists). This happens in Ray for example, where we need to make sure that if multiple python workers try to compile and import the TensorFlow op in parallel, there is no race condition (e.g. one worker could try to import a half-built version of the op). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3934) [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off
Philipp Moritz created ARROW-3934: - Summary: [Gandiva] Don't compile precompiled tests if ARROW_GANDIVA_BUILD_TESTS=off Key: ARROW-3934 URL: https://issues.apache.org/jira/browse/ARROW-3934 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Assignee: Philipp Moritz Fix For: 0.12.0 Currently the precompiled tests are compiled in any case, even if ARROW_GANDIVA_BUILD_TESTS=off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg
[ https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3199. --- Resolution: Fixed Issue resolved by pull request 2551 [https://github.com/apache/arrow/pull/2551] > [Plasma] Check for EAGAIN in recvmsg and sendmsg > > > Key: ARROW-3199 > URL: https://issues.apache.org/jira/browse/ARROW-3199 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > It turns out that > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63] > and probably also > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49] > can block and give an EAGAIN error. > This was discovered during stress tests by https://github.com/stephanie-wang/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2759) Export notification socket of Plasma
[ https://issues.apache.org/jira/browse/ARROW-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-2759. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3008 [https://github.com/apache/arrow/pull/3008] > Export notification socket of Plasma > > > Key: ARROW-2759 > URL: https://issues.apache.org/jira/browse/ARROW-2759 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++), Python >Reporter: Siyuan Zhuang >Assignee: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, I am implementing an async interface for Ray. The implementation > needs some kind of message polling methods like `get_next_notification`. > Unfortunately, I find `get_next_notification` in > `[https://github.com/apache/arrow/blob/master/python/pyarrow/_plasma.pyx]` > blocking, which is an impediment to implementing async utilities. Also, it's > hard to check the status of the socket (it could be closed or break up). So I > suggest export the notification socket so that there will be more flexibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3920) Plasma reference counting not properly done in TensorFlow custom operator.
[ https://issues.apache.org/jira/browse/ARROW-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3920. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 3061 [https://github.com/apache/arrow/pull/3061] > Plasma reference counting not properly done in TensorFlow custom operator. > -- > > Key: ARROW-3920 > URL: https://issues.apache.org/jira/browse/ARROW-3920 > Project: Apache Arrow > Issue Type: Bug > Components: Plasma (C++) >Reporter: Robert Nishihara >Assignee: Robert Nishihara >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We never call {{Release}} in the custom op code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize
[ https://issues.apache.org/jira/browse/ARROW-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-3919: - Assignee: Philipp Moritz > [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize > - > > Key: ARROW-3919 > URL: https://issues.apache.org/jira/browse/ARROW-3919 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > > see https://github.com/modin-project/modin/issues/266 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize
Philipp Moritz created ARROW-3919: - Summary: [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize Key: ARROW-3919 URL: https://issues.apache.org/jira/browse/ARROW-3919 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz see https://github.com/modin-project/modin/issues/266 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3199) [Plasma] Check for EAGAIN in recvmsg and sendmsg
[ https://issues.apache.org/jira/browse/ARROW-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704017#comment-16704017 ] Philipp Moritz commented on ARROW-3199: --- Sorry, I was travelling last week! Yeah, let me finish this PR. > [Plasma] Check for EAGAIN in recvmsg and sendmsg > > > Key: ARROW-3199 > URL: https://issues.apache.org/jira/browse/ARROW-3199 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It turns out that > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L63] > and probably also > [https://github.com/apache/arrow/blob/673125fd416cbd2e5c2cb9cb6a4c925adecdaf2c/cpp/src/plasma/fling.cc#L49] > can block and give an EAGAIN error. > This was discovered during stress tests by https://github.com/stephanie-wang/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3765) [Gandiva] Segfault when the validity bitmap has not been allocated
[ https://issues.apache.org/jira/browse/ARROW-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3765. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2967 [https://github.com/apache/arrow/pull/2967] > [Gandiva] Segfault when the validity bitmap has not been allocated > -- > > Key: ARROW-3765 > URL: https://issues.apache.org/jira/browse/ARROW-3765 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Gandiva >Reporter: Siyuan Zhuang >Assignee: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 40m > Remaining Estimate: 0h > > This is because the `validity buffer` could be `None`: > {code} > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [None, ] > >>> df = pd.DataFrame(np.random.randint(0, 100, size=(2**12, 10))*1.0) > >>> pa.Table.from_pandas(df).to_batches()[0].column(0).buffers() > [, 0x11a2b3228>]{code} > But Gandiva has not implemented it yet, thus accessing a nullptr: > {code} > void Annotator::PrepareBuffersForField(const FieldDescriptor& desc, const > arrow::ArrayData& array_data, EvalBatch* eval_batch) { > int buffer_idx = 0; > // TODO: > // - validity is optional > uint8_t* validity_buf = > const_cast(array_data.buffers[buffer_idx]->data()); > eval_batch->SetBuffer(desc.validity_idx(), validity_buf); > ++buffer_idx; > {code} > > Reproduce code: > {code:java} > frame_data = np.random.randint(0, 100, size=(2**22, 10)) > table = pa.Table.from_pandas(df) > filt = ... # Create any gandiva filter > r = filt.evaluate(table.to_batches()[0], pa.default_memory_pool()) # > segfault{code} > Backtrace: > {code:java} > * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS > (code=1, address=0x10) > * frame #0: 0x0001060184fc > libarrow.12.dylib`arrow::Buffer::data(this=0x) const at > buffer.h:162 > frame #1: 0x000106fbed78 > libgandiva.12.dylib`gandiva::Annotator::PrepareBuffersForField(this=0x000100624dc8, > desc=0x00010101e138, array_data=0x00010061f8e8, > eval_batch=0x000100796848) at annotator.cc:65 > frame #2: 0x000106fbf4ed > libgandiva.12.dylib`gandiva::Annotator::PrepareEvalBatch(this=0x000100624dc8, > record_batch=0x0001007a45b8, out_vector=size=1) at annotator.cc:94 > frame #3: 0x0001071449b7 > libgandiva.12.dylib`gandiva::LLVMGenerator::Execute(this=0x000100624da0, > record_batch=0x0001007a45b8, output_vector=size=1) at > llvm_generator.cc:102 > frame #4: 0x000107059a4f > libgandiva.12.dylib`gandiva::Filter::Evaluate(this=0x00010079c668, > batch=0x0001007a45b8, > out_selection=std::__1::shared_ptr::element_type @ > 0x0001007a43e8 strong=2 weak=1) at filter.cc:106 > frame #5: 0x00010948e002 > gandiva.cpython-36m-darwin.so`__pyx_pw_7pyarrow_7gandiva_6Filter_3evaluate(_object*, > _object*, _object*) + 1986 > frame #6: 0x000100140e8b Python`_PyCFunction_FastCallDict + 475 > frame #7: 0x0001001d28ca Python`call_function + 602 > frame #8: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #9: 0x0001001d3cf9 Python`fast_function + 569 > frame #10: 0x0001001d2899 Python`call_function + 553 > frame #11: 0x0001001cf798 Python`_PyEval_EvalFrameDefault + 24616 > frame #12: 0x0001001d34c6 Python`_PyEval_EvalCodeWithName + 2902 > frame #13: 0x0001001c96e0 Python`PyEval_EvalCode + 48 > frame #14: 0x0001002029ae Python`PyRun_FileExFlags + 174 > frame #15: 0x000100201f75 Python`PyRun_SimpleFileExFlags + 277 > frame #16: 0x00010021ef46 Python`Py_Main + 3558 > frame #17: 0x00010e08 Python`___lldb_unnamed_symbol1$$Python + 248 > frame #18: 0x7fff6ea72085 libdyld.dylib`start + 1{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3751) [Python] Add more cython bindings for gandiva
[ https://issues.apache.org/jira/browse/ARROW-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3751. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2936 [https://github.com/apache/arrow/pull/2936] > [Python] Add more cython bindings for gandiva > - > > Key: ARROW-3751 > URL: https://issues.apache.org/jira/browse/ARROW-3751 > Project: Apache Arrow > Issue Type: Improvement > Components: Gandiva, Python >Reporter: Siyuan Zhuang >Assignee: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > There are some cython bindings lost in ARROW-3602 (MakeAdd, MakeOr, MakeIn). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-3746: - Assignee: Philipp Moritz > [Gandiva] [Python] Make it possible to list all functions registered with > Gandiva > - > > Key: ARROW-3746 > URL: https://issues.apache.org/jira/browse/ARROW-3746 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > This will also be useful for documentation purposes (right now it is not very > easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3746. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2933 [https://github.com/apache/arrow/pull/2933] > [Gandiva] [Python] Make it possible to list all functions registered with > Gandiva > - > > Key: ARROW-3746 > URL: https://issues.apache.org/jira/browse/ARROW-3746 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This will also be useful for documentation purposes (right now it is not very > easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3742) Fix pyarrow.types & gandiva cython bindings
[ https://issues.apache.org/jira/browse/ARROW-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3742. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2931 [https://github.com/apache/arrow/pull/2931] > Fix pyarrow.types & gandiva cython bindings > --- > > Key: ARROW-3742 > URL: https://issues.apache.org/jira/browse/ARROW-3742 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva, Python >Reporter: Siyuan Zhuang >Assignee: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > 1. 'types.py' didn't export `_as_type`, causing failures in certain > cython/python combinations. I am surprised to see that the CI didn't fail. > 2. After updating the gandiva cpp part (ARROW-3587), the cython bindings > (ARROW-3602) are not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3746) [Gandiva] [Python] Make it possible to list all functions registered with Gandiva
Philipp Moritz created ARROW-3746: - Summary: [Gandiva] [Python] Make it possible to list all functions registered with Gandiva Key: ARROW-3746 URL: https://issues.apache.org/jira/browse/ARROW-3746 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz This will also be useful for documentation purposes (right now it is not very easy to get a list of all the functions that are registered). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3602) [Gandiva] [Python] Add preliminary Cython bindings for Gandiva
[ https://issues.apache.org/jira/browse/ARROW-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-3602: - Assignee: Philipp Moritz > [Gandiva] [Python] Add preliminary Cython bindings for Gandiva > -- > > Key: ARROW-3602 > URL: https://issues.apache.org/jira/browse/ARROW-3602 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.11.1 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Adding a first version of Cython bindings to Gandiva so it can be called from > Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3587) [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)
[ https://issues.apache.org/jira/browse/ARROW-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3587. --- Resolution: Fixed Fix Version/s: 0.12.0 Issue resolved by pull request 2832 [https://github.com/apache/arrow/pull/2832] > [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc) > -- > > Key: ARROW-3587 > URL: https://issues.apache.org/jira/browse/ARROW-3587 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++), Python >Reporter: Siyuan Zhuang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently, Arrow seems to have poor serialization support for its own objects. > For example, > > {code} > import pyarrow > arr = pyarrow.array([1, 2, 3, 4]) > pyarrow.serialize(arr) > {code} > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/serialization.pxi", line 337, in pyarrow.lib.serialize > File "pyarrow/serialization.pxi", line 136, in > pyarrow.lib.SerializationContext._serialize_callback > pyarrow.lib.SerializationCallbackError: pyarrow does not know how to > serialize objects of type . > I am working Ray & modin project, using plasma to store Arrow objects. Lack > of direct serialization support harms the performance, so I would like to > push a PR to fix this problem. > I wonder if it is welcome or is there someone else doing it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3721) [Gandiva] [Python] Support all Gandiva literals
Philipp Moritz created ARROW-3721: - Summary: [Gandiva] [Python] Support all Gandiva literals Key: ARROW-3721 URL: https://issues.apache.org/jira/browse/ARROW-3721 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz Support all the literals from [https://github.com/apache/arrow/blob/5b116ab175292fe70ed3c8727bcc6868b9695f4a/cpp/src/gandiva/tree_expr_builder.h#L35] in the Cython bindings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3718) [Gandiva] Remove spurious gtest include
[ https://issues.apache.org/jira/browse/ARROW-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3718. --- Resolution: Fixed Issue resolved by pull request 2917 [https://github.com/apache/arrow/pull/2917] > [Gandiva] Remove spurious gtest include > --- > > Key: ARROW-3718 > URL: https://issues.apache.org/jira/browse/ARROW-3718 > Project: Apache Arrow > Issue Type: Improvement > Components: Gandiva >Affects Versions: 0.11.1 >Reporter: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > At the moment, cpp/src/gandiva/expr_decomposer.h includes a gtest header > which can prevent gandiva to be built without the gtest dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)