[jira] [Commented] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385596#comment-16385596 ] ASF GitHub Bot commented on ARROW-2251: --- wesm commented on issue #1691: ARROW-2251: [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live URL: https://github.com/apache/arrow/pull/1691#issuecomment-370307596 I see, so there's a "weak" reference to memory that is held by another buffer object. We have to deal with some issues like this in Python to handle dependency chains where the `shared_ptr` is not aware of memory relationships expressed at the Python level This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes > a crash > - > > Key: ARROW-2251 > URL: https://issues.apache.org/jira/browse/ARROW-2251 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.8.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385598#comment-16385598 ] ASF GitHub Bot commented on ARROW-2251: --- wesm closed pull request #1691: ARROW-2251: [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live URL: https://github.com/apache/arrow/pull/1691 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/c_glib/arrow-glib/input-stream.cpp b/c_glib/arrow-glib/input-stream.cpp index 94422241b..f602e5f7e 100644 --- a/c_glib/arrow-glib/input-stream.cpp +++ b/c_glib/arrow-glib/input-stream.cpp @@ -282,7 +282,7 @@ garrow_seekable_input_stream_read_tensor(GArrowSeekableInputStream *input_stream arrow_random_access_file.get(), &arrow_tensor); if (garrow_error_check(error, status, "[seekable-input-stream][read-tensor]")) { -return garrow_tensor_new_raw(&arrow_tensor); +return garrow_tensor_new_raw(&arrow_tensor, nullptr); } else { return NULL; } diff --git a/c_glib/arrow-glib/tensor.cpp b/c_glib/arrow-glib/tensor.cpp index 3325f8511..359831f67 100644 --- a/c_glib/arrow-glib/tensor.cpp +++ b/c_glib/arrow-glib/tensor.cpp @@ -40,11 +40,13 @@ G_BEGIN_DECLS typedef struct GArrowTensorPrivate_ { std::shared_ptr tensor; + GArrowBuffer *buffer; } GArrowTensorPrivate; enum { PROP_0, - PROP_TENSOR + PROP_TENSOR, + PROP_BUFFER }; G_DEFINE_TYPE_WITH_PRIVATE(GArrowTensor, garrow_tensor, G_TYPE_OBJECT) @@ -52,6 +54,19 @@ G_DEFINE_TYPE_WITH_PRIVATE(GArrowTensor, garrow_tensor, G_TYPE_OBJECT) #define GARROW_TENSOR_GET_PRIVATE(obj) \ (G_TYPE_INSTANCE_GET_PRIVATE((obj), GARROW_TYPE_TENSOR, GArrowTensorPrivate)) +static void +garrow_tensor_dispose(GObject *object) +{ + auto priv = GARROW_TENSOR_GET_PRIVATE(object); + + if (priv->buffer) { +g_object_unref(priv->buffer); +priv->buffer = nullptr; + } + + G_OBJECT_CLASS(garrow_tensor_parent_class)->dispose(object); +} + static void garrow_tensor_finalize(GObject *object) { @@ -64,9 +79,9 @@ garrow_tensor_finalize(GObject *object) static void garrow_tensor_set_property(GObject *object, - guint prop_id, - const GValue *value, - GParamSpec *pspec) + guint prop_id, + const GValue *value, + GParamSpec *pspec) { auto priv = GARROW_TENSOR_GET_PRIVATE(object); @@ -75,6 +90,9 @@ garrow_tensor_set_property(GObject *object, priv->tensor = *static_cast *>(g_value_get_pointer(value)); break; + case PROP_BUFFER: +priv->buffer = GARROW_BUFFER(g_value_dup_object(value)); +break; default: G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec); break; @@ -83,11 +101,16 @@ garrow_tensor_set_property(GObject *object, static void garrow_tensor_get_property(GObject *object, - guint prop_id, - GValue *value, - GParamSpec *pspec) + guint prop_id, + GValue *value, + GParamSpec *pspec) { + auto priv = GARROW_TENSOR_GET_PRIVATE(object); + switch (prop_id) { + case PROP_BUFFER: +g_value_set_object(value, priv->buffer); +break; default: G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec); break; @@ -106,6 +129,7 @@ garrow_tensor_class_init(GArrowTensorClass *klass) auto gobject_class = G_OBJECT_CLASS(klass); + gobject_class->dispose = garrow_tensor_dispose; gobject_class->finalize = garrow_tensor_finalize; gobject_class->set_property = garrow_tensor_set_property; gobject_class->get_property = garrow_tensor_get_property; @@ -116,6 +140,14 @@ garrow_tensor_class_init(GArrowTensorClass *klass) static_cast(G_PARAM_WRITABLE | G_PARAM_CONSTRUCT_ONLY)); g_object_class_install_property(gobject_class, PROP_TENSOR, spec); + + spec = g_param_spec_object("buffer", + "Buffer", + "The data", + GARROW_TYPE_BUFFER, + static_cast(G_PARAM_READWRITE | + G_PARAM_CONSTRUCT_ONLY)); + g_object_class_install_property(gobject_class, PROP_BUFFER, spec); } /** @@ -166,7 +198,7 @@ garrow_tensor_new(GArrowDataType *data_type, arrow_shape,
[jira] [Resolved] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2251. - Resolution: Fixed Issue resolved by pull request 1691 [https://github.com/apache/arrow/pull/1691] > [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes > a crash > - > > Key: ARROW-2251 > URL: https://issues.apache.org/jira/browse/ARROW-2251 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.8.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385595#comment-16385595 ] ASF GitHub Bot commented on ARROW-2251: --- kou commented on issue #1691: ARROW-2251: [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live URL: https://github.com/apache/arrow/pull/1691#issuecomment-370307240 Partially right. `shared_ptr` keeps `shared_ptr` alive but memory in the `shard_ptr` may be freed when the `shared_ptr` just refers external memory. It's caused by creating `shard_ptr` by `Arrow::Buffer.new("...data...")` in Ruby. (It creates `GArrowBuffer` in C.) The `"...data..."` should be alive while the `Arrow::Buffer` is alive. `shared_ptr` is only alive without this change. Both `shared_ptr` and `GArrowBuffer` are alive with this change. The `GArrowBuffer` should keep the data alive. I'll send one more pull request to improve memory management in `GArrowBuffer`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes > a crash > - > > Key: ARROW-2251 > URL: https://issues.apache.org/jira/browse/ARROW-2251 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.8.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2254: Fix Version/s: (was: 0.9.0) 0.10.0 > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2252) [Python] Create buffer from address, size and base
[ https://issues.apache.org/jira/browse/ARROW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385592#comment-16385592 ] ASF GitHub Bot commented on ARROW-2252: --- wesm closed pull request #1693: ARROW-2252: [Python] Create buffer from address, size and base URL: https://github.com/apache/arrow/pull/1693 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/pyarrow/__init__.py b/python/pyarrow/__init__.py index 15a37ca10..8cb4b3b9b 100644 --- a/python/pyarrow/__init__.py +++ b/python/pyarrow/__init__.py @@ -72,8 +72,8 @@ from pyarrow.lib import TimestampType # Buffers, allocation -from pyarrow.lib import (Buffer, ResizableBuffer, compress, decompress, - allocate_buffer, frombuffer) +from pyarrow.lib import (Buffer, ForeignBuffer, ResizableBuffer, compress, + decompress, allocate_buffer, frombuffer) from pyarrow.lib import (MemoryPool, total_allocated_bytes, set_memory_pool, default_memory_pool, diff --git a/python/pyarrow/io.pxi b/python/pyarrow/io.pxi index 325c5827f..5c8411be4 100644 --- a/python/pyarrow/io.pxi +++ b/python/pyarrow/io.pxi @@ -720,6 +720,18 @@ cdef class Buffer: return self.size +cdef class ForeignBuffer(Buffer): + +def __init__(self, addr, size, base): +cdef: +intptr_t c_addr = addr +int64_t c_size = size +self.base = base +cdef shared_ptr[CBuffer] buffer = make_shared[CBuffer]( +c_addr, c_size) +self.init( buffer) + + cdef class ResizableBuffer(Buffer): cdef void init_rz(self, const shared_ptr[CResizableBuffer]& buffer): diff --git a/python/pyarrow/lib.pxd b/python/pyarrow/lib.pxd index e4d574f18..c37bc2beb 100644 --- a/python/pyarrow/lib.pxd +++ b/python/pyarrow/lib.pxd @@ -324,6 +324,11 @@ cdef class Buffer: cdef int _check_nullptr(self) except -1 +cdef class ForeignBuffer(Buffer): +cdef: +object base + + cdef class ResizableBuffer(Buffer): cdef void init_rz(self, const shared_ptr[CResizableBuffer]& buffer) diff --git a/python/pyarrow/tests/test_io.py b/python/pyarrow/tests/test_io.py index d269ad0e7..17aca4333 100644 --- a/python/pyarrow/tests/test_io.py +++ b/python/pyarrow/tests/test_io.py @@ -24,6 +24,7 @@ import weakref import numpy as np +import numpy.testing as npt import pandas as pd @@ -253,6 +254,14 @@ def test_buffer_equals(): assert buf2.equals(buf5) +def test_foreign_buffer(): +n = np.array([1, 2]) +addr = n.__array_interface__["data"][0] +size = n.nbytes +fb = pa.ForeignBuffer(addr, size, n) +npt.assert_array_equal(np.asarray(fb), n.view(dtype=np.int8)) + + def test_allocate_buffer(): buf = pa.allocate_buffer(100) assert buf.size == 100 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Create buffer from address, size and base > -- > > Key: ARROW-2252 > URL: https://issues.apache.org/jira/browse/ARROW-2252 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Given a memory address and a size, we should be able to construct an Arrow > buffer from this. The additional base object will be used to hold a reference > to the underlying, original buffer so that it does not go out of scope before > the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2252) [Python] Create buffer from address, size and base
[ https://issues.apache.org/jira/browse/ARROW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2252. - Resolution: Fixed Issue resolved by pull request 1693 [https://github.com/apache/arrow/pull/1693] > [Python] Create buffer from address, size and base > -- > > Key: ARROW-2252 > URL: https://issues.apache.org/jira/browse/ARROW-2252 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Given a memory address and a size, we should be able to construct an Arrow > buffer from this. The additional base object will be used to hold a reference > to the underlying, original buffer so that it does not go out of scope before > the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2258) [C++] Appveyor builds failing on master
Wes McKinney created ARROW-2258: --- Summary: [C++] Appveyor builds failing on master Key: ARROW-2258 URL: https://issues.apache.org/jira/browse/ARROW-2258 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.9.0 See https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/build/1.0.5563 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2257) [C++] Add high-level option to toggle CXX11 ABI
Wes McKinney created ARROW-2257: --- Summary: [C++] Add high-level option to toggle CXX11 ABI Key: ARROW-2257 URL: https://issues.apache.org/jira/browse/ARROW-2257 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.9.0 Using gcc-4.8-based toolchain libraries from conda-forge I ran into the following failure when building on Ubuntu 16.04 with clang-5.0 {code} [48/48] Linking CXX executable debug/python-test FAILED: debug/python-test : && /usr/bin/ccache /usr/bin/clang++-5.0 -ggdb -O0 -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-deprecated -Wno-weak-vtables -Wno-padded -Wno-comma -Wno-unused-parameter -Wno-unused-template -Wno-undef -Wno-shadow -Wno-switch-enum -Wno-exit-time-destructors -Wno-global-constructors -Wno-weak-template-vtables -Wno-undefined-reinterpret-cast -Wno-implicit-fallthrough -Wno-unreachable-code-return -Wno-float-equal -Wno-missing-prototypes -Wno-old-style-cast -Wno-covered-switch-default -Wno-cast-align -Wno-vla-extension -Wno-shift-sign-overflow -Wno-used-but-marked-unused -Wno-missing-variable-declarations -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Wno-sign-conversion -Wno-disabled-macro-expansion -Wno-gnu-folding-constant -Wno-reserved-id-macro -Wno-range-loop-analysis -Wno-double-promotion -Wno-undefined-func-template -Wno-zero-as-null-pointer-constant -Wno-unknown-warning-option -Werror -std=c++11 -msse3 -maltivec -Werror -D_GLIBCXX_USE_CXX11_ABI=0 -Qunused-arguments -fsanitize=address -DADDRESS_SANITIZER -fsanitize-coverage=trace-pc-guard -g -rdynamic src/arrow/python/CMakeFiles/python-test.dir/python-test.cc.o -o debug/python-test -Wl,-rpath,/home/wesm/code/arrow/cpp/build/debug:/home/wesm/miniconda/envs/arrow-dev/lib:/home/wesm/cpp-toolchain/lib debug/libarrow_python_test_main.a debug/libarrow_python.a debug/libarrow.so.0.0.0 /home/wesm/miniconda/envs/arrow-dev/lib/libpython3.6m.so /home/wesm/cpp-toolchain/lib/libgtest.a -lpthread -ldl orc_ep-install/lib/liborc.a /home/wesm/cpp-toolchain/lib/libprotobuf.a /home/wesm/cpp-toolchain/lib/libzstd.a /home/wesm/cpp-toolchain/lib/libz.a /home/wesm/cpp-toolchain/lib/libsnappy.a /home/wesm/cpp-toolchain/lib/liblz4.a /home/wesm/cpp-toolchain/lib/libbrotlidec-static.a /home/wesm/cpp-toolchain/lib/libbrotlienc-static.a /home/wesm/cpp-toolchain/lib/libbrotlicommon-static.a -lpthread -Wl,-rpath-link,/home/wesm/cpp-toolchain/lib && : debug/libarrow.so.0.0.0: undefined reference to `orc::ParseError::ParseError(std::string const&)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::__cxx11::basic_string, std::allocator > const&, unsigned char*)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::__cxx11::basic_string, std::allocator > const&, google::protobuf::io::CodedOutputStream*)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::fixed_address_empty_string[abi:cxx11]' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::__cxx11::basic_string, std::allocator >*)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::Message::GetTypeName[abi:cxx11]() const' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::MessageLite::SerializeToString(std::__cxx11::basic_string, std::allocator >*) const' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::WireFormatLite::WriteString(int, std::__cxx11::basic_string, std::allocator > const&, google::protobuf::io::CodedOutputStream*)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void (*)(std::__cxx11::basic_string, std::allocator > const&))' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::__cxx11::basic_string, std::allocator > const&, google::protobuf::io::CodedOutputStream*)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::internal::AssignDescriptors(std::__cxx11::basic_string, std::allocator > const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::MessageFactory*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)' debug/libarrow.so.0.0.0: undefined reference to `google::protobuf::MessageLite::ParseFromString(std::__cxx11::basic_string, std::allocator > const&)' debug/libar
[jira] [Updated] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
[ https://issues.apache.org/jira/browse/ARROW-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2256: Description: I did a clean upgrade to 16.04 on one of my machine and ran into the problem described here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087 I think this can be resolved temporarily by symlinking the static library, but we should document the problem so other devs know what to do when it happens was: I did a clean upgrade to 16.04 on one of my machine and ran into the problem described here: > [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos > > > Key: ARROW-2256 > URL: https://issues.apache.org/jira/browse/ARROW-2256 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I did a clean upgrade to 16.04 on one of my machine and ran into the problem > described here: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866087 > I think this can be resolved temporarily by symlinking the static library, > but we should document the problem so other devs know what to do when it > happens -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2256) [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos
Wes McKinney created ARROW-2256: --- Summary: [C++] Fuzzer builds fail out of the box on Ubuntu 16.04 using LLVM apt repos Key: ARROW-2256 URL: https://issues.apache.org/jira/browse/ARROW-2256 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.9.0 I did a clean upgrade to 16.04 on one of my machine and ran into the problem described here: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2167) [C++] Building Orc extensions fails with the default BUILD_WARNING_LEVEL=Production
[ https://issues.apache.org/jira/browse/ARROW-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2167: Issue Type: Bug (was: Improvement) > [C++] Building Orc extensions fails with the default > BUILD_WARNING_LEVEL=Production > --- > > Key: ARROW-2167 > URL: https://issues.apache.org/jira/browse/ARROW-2167 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.8.0 >Reporter: Phillip Cloud >Priority: Major > Fix For: 0.9.0 > > > Building orc_ep fails because there are a bunch of upstream warnings like not > providing {{override}} on virtual destructor subclasses, and using {{0}} as > the {{nullptr}} constant and the default {{BUILD_WARNING_LEVEL}} is > {{Production}} which includes {{-Wall}} (all warnings as errors). > I see that there are different possible options for {{BUILD_WARNING_LEVEL}} > so it's possible for developers to deal with this issue. > It seems easier to let EPs build with whatever the default warning level is > for the project rather than force our defaults on those projects. > Generally speaking, are we using our own CXX_FLAGS for EPs other than Orc? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-906) [C++] Serialize Field metadata to IPC metadata
[ https://issues.apache.org/jira/browse/ARROW-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-906: --- Fix Version/s: (was: 0.9.0) 0.10.0 > [C++] Serialize Field metadata to IPC metadata > -- > > Key: ARROW-906 > URL: https://issues.apache.org/jira/browse/ARROW-906 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > Follow up work to ARROW-898 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2150) [Python] array equality defaults to identity
[ https://issues.apache.org/jira/browse/ARROW-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2150: Issue Type: Bug (was: Improvement) > [Python] array equality defaults to identity > > > Key: ARROW-2150 > URL: https://issues.apache.org/jira/browse/ARROW-2150 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Priority: Minor > Fix For: 0.9.0 > > > I'm not sure this is deliberate, but it doesn't look very desirable to me: > {code} > >>> pa.array([1,2,3], type=pa.int32()) == pa.array([1,2,3], type=pa.int32()) > False > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1954) [Python] Add metadata accessor to pyarrow.Field
[ https://issues.apache.org/jira/browse/ARROW-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1954: Fix Version/s: (was: 0.9.0) 0.10.0 > [Python] Add metadata accessor to pyarrow.Field > --- > > Key: ARROW-1954 > URL: https://issues.apache.org/jira/browse/ARROW-1954 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > Depends on ARROW-906 for this data to survive IPC roundtrip -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2255) Serialize schema- and field-level custom metadata in integration test JSON format
Wes McKinney created ARROW-2255: --- Summary: Serialize schema- and field-level custom metadata in integration test JSON format Key: ARROW-2255 URL: https://issues.apache.org/jira/browse/ARROW-2255 Project: Apache Arrow Issue Type: Bug Components: C++, Java - Vectors Reporter: Wes McKinney I don't believe we are doing this at present. We should validate that each implementation properly handles the incoming metadata from other Arrow emitters -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1982) [Python] Return parquet statistics min/max as values instead of strings
[ https://issues.apache.org/jira/browse/ARROW-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385404#comment-16385404 ] ASF GitHub Bot commented on ARROW-1982: --- wesm opened a new pull request #1698: ARROW-1982: [Python] Coerce Parquet statistics as bytes to more useful Python scalar types URL: https://github.com/apache/arrow/pull/1698 I also changed the BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY to return bytes since decoding from binary to UTF8 unicode didn't seem correct to me as the default behavior This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Return parquet statistics min/max as values instead of strings > --- > > Key: ARROW-1982 > URL: https://issues.apache.org/jira/browse/ARROW-1982 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Jim Crist >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently `min` and `max` column statistics are returned as formatted strings > of the _physical type_. This makes using them in python a bit tricky, as the > strings need to be parsed as the proper _logical type_. Observe: > {code} > In [20]: import pandas as pd > In [21]: df = pd.DataFrame({'a': [1, 2, 3], > ...:'b': ['a', 'b', 'c'], > ...:'c': [pd.Timestamp('1991-01-01')]*3}) > ...: > In [22]: df.to_parquet('temp.parquet', engine='pyarrow') > In [23]: from pyarrow import parquet as pq > In [24]: f = pq.ParquetFile('temp.parquet') > In [25]: rg = f.metadata.row_group(0) > In [26]: rg.column(0).statistics.min # string instead of integer > Out[26]: '1' > In [27]: rg.column(1).statistics.min # weird space added after value due to > formatter > Out[27]: 'a ' > In [28]: rg.column(2).statistics.min # formatted as physical type (int) > instead of logical (datetime) > Out[28]: '66268800' > {code} > Since the type information is known, it should be possible to convert these > to arrow values instead of strings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1982) [Python] Return parquet statistics min/max as values instead of strings
[ https://issues.apache.org/jira/browse/ARROW-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1982: -- Labels: pull-request-available (was: ) > [Python] Return parquet statistics min/max as values instead of strings > --- > > Key: ARROW-1982 > URL: https://issues.apache.org/jira/browse/ARROW-1982 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Jim Crist >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Currently `min` and `max` column statistics are returned as formatted strings > of the _physical type_. This makes using them in python a bit tricky, as the > strings need to be parsed as the proper _logical type_. Observe: > {code} > In [20]: import pandas as pd > In [21]: df = pd.DataFrame({'a': [1, 2, 3], > ...:'b': ['a', 'b', 'c'], > ...:'c': [pd.Timestamp('1991-01-01')]*3}) > ...: > In [22]: df.to_parquet('temp.parquet', engine='pyarrow') > In [23]: from pyarrow import parquet as pq > In [24]: f = pq.ParquetFile('temp.parquet') > In [25]: rg = f.metadata.row_group(0) > In [26]: rg.column(0).statistics.min # string instead of integer > Out[26]: '1' > In [27]: rg.column(1).statistics.min # weird space added after value due to > formatter > Out[27]: 'a ' > In [28]: rg.column(2).statistics.min # formatted as physical type (int) > instead of logical (datetime) > Out[28]: '66268800' > {code} > Since the type information is known, it should be possible to convert these > to arrow values instead of strings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2195) [Plasma] Segfault when retrieving RecordBatch from plasma store
[ https://issues.apache.org/jira/browse/ARROW-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2195: Issue Type: Bug (was: Improvement) > [Plasma] Segfault when retrieving RecordBatch from plasma store > --- > > Key: ARROW-2195 > URL: https://issues.apache.org/jira/browse/ARROW-2195 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philipp Moritz >Priority: Major > Fix For: 0.9.0 > > > It can be reproduced with the following script: > {code:python} > import pyarrow as pa > import pyarrow.plasma as plasma > def retrieve1(): > client = plasma.connect('test', "", 0) > key = "keynumber1keynumber1" > pid = plasma.ObjectID(bytearray(key,'UTF-8')) > [buff] = client .get_buffers([pid]) > batch = pa.RecordBatchStreamReader(buff).read_next_batch() > print(batch) > print(batch.schema) > print(batch[0]) > return batch > client = plasma.connect('test', "", 0) > test1 = [1, 12, 23, 3, 21, 34] > test1 = pa.array(test1, pa.int32()) > batch = pa.RecordBatch.from_arrays([test1], ['FIELD1']) > key = "keynumber1keynumber1" > pid = plasma.ObjectID(bytearray(key,'UTF-8')) > sink = pa.MockOutputStream() > stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema) > stream_writer.write_batch(batch) > stream_writer.close() > bff = client.create(pid, sink.size()) > stream = pa.FixedSizeBufferWriter(bff) > writer = pa.RecordBatchStreamWriter(stream, batch.schema) > writer.write_batch(batch) > client.seal(pid) > batch = retrieve1() > print(batch) > print(batch.schema) > print(batch[0]) > {code} > > Preliminary backtrace: > > {code} > CESS (code=1, address=0x38158) > frame #0: 0x00010e6457fc > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28 > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py: > -> 0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi > 0x10e645800 <+32>: callq 0x10e698170 ; symbol stub for: > PyInt_FromLong > 0x10e645805 <+37>: testq %rax, %rax > 0x10e645808 <+40>: je 0x10e64580c ; <+44> > (lldb) bt > * thread #1: tid = 0xf1378e, 0x00010e6457fc > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28, > queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, > address=0x38158) > * frame #0: 0x00010e6457fc > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + 28 > frame #1: 0x00010e5ccd35 lib.so`__Pyx_PyObject_CallNoArg(_object*) + > 133 > frame #2: 0x00010e613b25 > lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933 > frame #3: 0x00010c2f83bc libpython2.7.dylib`PyObject_Repr + 60 > frame #4: 0x00010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx + 22305 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385383#comment-16385383 ] Uwe L. Korn commented on ARROW-2254: I think, I have found the missing option, will make a PR later. > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1929) [C++] Move various Arrow testing utility code from Parquet to Arrow codebase
[ https://issues.apache.org/jira/browse/ARROW-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1929: -- Labels: pull-request-available (was: ) > [C++] Move various Arrow testing utility code from Parquet to Arrow codebase > > > Key: ARROW-1929 > URL: https://issues.apache.org/jira/browse/ARROW-1929 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/apache/parquet-cpp/pull/426 and comments within -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1929) [C++] Move various Arrow testing utility code from Parquet to Arrow codebase
[ https://issues.apache.org/jira/browse/ARROW-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385382#comment-16385382 ] ASF GitHub Bot commented on ARROW-1929: --- wesm opened a new pull request #1697: ARROW-1929: [C++] Copy over testing utility code from PARQUET-1092 URL: https://github.com/apache/arrow/pull/1697 This code was introduced in parquet-cpp in https://github.com/apache/parquet-cpp/pull/426 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Move various Arrow testing utility code from Parquet to Arrow codebase > > > Key: ARROW-1929 > URL: https://issues.apache.org/jira/browse/ARROW-1929 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/apache/parquet-cpp/pull/426 and comments within -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration
[ https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385378#comment-16385378 ] ASF GitHub Bot commented on ARROW-2238: --- MaxRis commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in cmake configuration URL: https://github.com/apache/arrow/pull/1684#issuecomment-370263126 @pitrou I will check if that resolves Appveyor build failure and let you know, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Detect clcache in cmake configuration > --- > > Key: ARROW-2238 > URL: https://issues.apache.org/jira/browse/ARROW-2238 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > By default Windows builds should use clcache if installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration
[ https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385375#comment-16385375 ] ASF GitHub Bot commented on ARROW-2238: --- pitrou commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in cmake configuration URL: https://github.com/apache/arrow/pull/1684#issuecomment-370262825 @MaxRis that sounds ok to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Detect clcache in cmake configuration > --- > > Key: ARROW-2238 > URL: https://issues.apache.org/jira/browse/ARROW-2238 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > By default Windows builds should use clcache if installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385368#comment-16385368 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on issue #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#issuecomment-370262328 Addressed review comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2246) [Python] Use namespaced boost in manylinux1 package
[ https://issues.apache.org/jira/browse/ARROW-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2246. - Resolution: Fixed Done in https://github.com/apache/arrow/commit/8b1c8118b017a941f0102709d72df7e5a9783aa4 > [Python] Use namespaced boost in manylinux1 package > --- > > Key: ARROW-2246 > URL: https://issues.apache.org/jira/browse/ARROW-2246 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Blocker > Fix For: 0.9.0 > > > Boost provides the functionality to generate a namespaced copy of all its > implementations. This means that you can have a private copy of Boost in your > library that will not come into conflict with other Boost installations in > your setting. While for e.g. conda-forge a good ecosystem exists that > provides the unique Boost version, in the setting of the manylinux1 wheels we > have no control over which other Boost version exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385361#comment-16385361 ] Wes McKinney commented on ARROW-2254: - We can also not support development with {{build_ext --inplace}} but if it is not too difficult, it would be nice > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385363#comment-16385363 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172063380 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -174,14 +174,16 @@ public void setInitialCapacity(int valueCount) { * @param valueCount desired number of elements in the vector * @param density average number of bytes per variable width element */ + @Override public void setInitialCapacity(int valueCount, double density) { -final long size = (long) (valueCount * density); -if (size < 1) { - throw new IllegalArgumentException("With the provided density and value count, potential capacity of the data buffer is 0"); -} +long size = (long) (valueCount * density); if (size > MAX_ALLOCATION_SIZE) { throw new OversizedAllocationException("Requested amount of memory is more than max allowed"); } + +if(size == 0) { + size = 1; +} Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385359#comment-16385359 ] Wes McKinney commented on ARROW-2254: - It's the same for me -- this is for in-place builds rather than installs, so we need to put the setuptools_scm version resolution code someplace where it can be used in pyarrow/__init__.py > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385356#comment-16385356 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172062968 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -174,14 +174,16 @@ public void setInitialCapacity(int valueCount) { * @param valueCount desired number of elements in the vector * @param density average number of bytes per variable width element */ + @Override public void setInitialCapacity(int valueCount, double density) { -final long size = (long) (valueCount * density); -if (size < 1) { - throw new IllegalArgumentException("With the provided density and value count, potential capacity of the data buffer is 0"); -} +long size = (long) (valueCount * density); if (size > MAX_ALLOCATION_SIZE) { throw new OversizedAllocationException("Requested amount of memory is more than max allowed"); } + +if(size == 0) { Review comment: Yes we cannot have an initial capacity of 0 because then our safe* functions run into an infinite loop where they try to realloc and have the target buffer size as next power of 2 -- BaseAllocator.nextPowerOfTwo returns 0 for 0 and thus safe functions keep calling realloc. This happens if the initial capacity was 0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385355#comment-16385355 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172063231 ## File path: java/vector/src/main/codegen/templates/UnionVector.java ## @@ -282,6 +282,7 @@ private void reallocTypeBuffer() { long newAllocationSize = baseSize * 2L; newAllocationSize = BaseAllocator.nextPowerOfTwo(newAllocationSize); +newAllocationSize = Math.max(newAllocationSize, 1); Review comment: Now that setInitialCapacity is safeguarded to not allow an initial capacity less than 1, we may not hit this case but I think it is better to do the check in realloc as well -- else we will run in infinite loop. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385352#comment-16385352 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172063146 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java ## @@ -810,15 +810,6 @@ public void testSetInitialCapacity() { vector.allocateNew(); assertEquals(512, vector.getValueCapacity()); assertEquals(8, vector.getDataVector().getValueCapacity()); - - boolean error = false; - try { -vector.setInitialCapacity(5, 0.1); Review comment: No earlier we were throwing IllegalStateException but that shouldn't be done. Now we take a max and set it to 1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean
[ https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385353#comment-16385353 ] Wes McKinney commented on ARROW-1491: - [~cpcloud] this would be nice to have, but relative to the bug backlog for 0.9.0 we could also defer this to the next release > [C++] Add casting implementations from strings to numbers or boolean > > > Key: ARROW-1491 > URL: https://issues.apache.org/jira/browse/ARROW-1491 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Licht Takeuchi >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385354#comment-16385354 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172063162 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -1933,15 +1933,6 @@ public void testSetInitialCapacity() { vector.allocateNew(); assertEquals(4096, vector.getValueCapacity()); assertEquals(64, vector.getDataBuffer().capacity()); - - boolean error = false; - try { -vector.setInitialCapacity(5, 0.1); Review comment: We take a max and set it to 1 if needed. Exception handling is not needed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385349#comment-16385349 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172063122 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/BaseRepeatedValueVector.java ## @@ -166,13 +168,23 @@ public void setInitialCapacity(int numRecords) { *This helps in tightly controlling the memory we provision *for inner data vector. */ + @Override public void setInitialCapacity(int numRecords, double density) { +if ((numRecords * density) >= Integer.MAX_VALUE) { + throw new OversizedAllocationException("Requested amount of memory is more than max allowed"); +} offsetAllocationSizeInBytes = (numRecords + 1) * OFFSET_WIDTH; -final int innerValueCapacity = (int)(numRecords * density); -if (innerValueCapacity < 1) { - throw new IllegalArgumentException("With the provided density and value count, potential value capacity for the data vector is 0"); +int innerValueCapacity = (int)(numRecords * density); + +if(innerValueCapacity == 0) { + innerValueCapacity = 1; +} Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2244) [C++] Slicing NullArray should not cause the null count on the internal data to be unknown
[ https://issues.apache.org/jira/browse/ARROW-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2244: -- Labels: pull-request-available (was: ) > [C++] Slicing NullArray should not cause the null count on the internal data > to be unknown > -- > > Key: ARROW-2244 > URL: https://issues.apache.org/jira/browse/ARROW-2244 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.cc#L101 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385350#comment-16385350 ] Uwe L. Korn commented on ARROW-2254: For reference, I get the following: {code:java} In [3]: import setuptools_scm.git ...: describe = setuptools_scm.git.DEFAULT_DESCRIBE + " --match 'apache-arrow-[0-9]*'" ...: command = describe.replace("--match *.*", "") ...: In [4]: command Out[4]: "git describe --dirty --tags --long --match 'apache-arrow-[0-9]*'" In [5]: !git describe --dirty --tags --long --match 'apache-arrow-[0-9]*' apache-arrow-0.8.0-214-g4ff04cf-dirty{code} > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2244) [C++] Slicing NullArray should not cause the null count on the internal data to be unknown
[ https://issues.apache.org/jira/browse/ARROW-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385351#comment-16385351 ] ASF GitHub Bot commented on ARROW-2244: --- wesm opened a new pull request #1696: ARROW-2244: [C++] Add unit test to explicitly check that NullArray internal data set correctly in Slice operations URL: https://github.com/apache/arrow/pull/1696 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Slicing NullArray should not cause the null count on the internal data > to be unknown > -- > > Key: ARROW-2244 > URL: https://issues.apache.org/jira/browse/ARROW-2244 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > see https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.cc#L101 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385348#comment-16385348 ] Uwe L. Korn commented on ARROW-2254: Please post the output from executing the value of {{command}} {code:java} import setuptools_scm.git describe = setuptools_scm.git.DEFAULT_DESCRIBE + " --match 'apache-arrow-[0-9]*'" command = describe.replace("--match *.*", ""){code} > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385347#comment-16385347 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172062999 ## File path: java/vector/src/main/java/org/apache/arrow/vector/DensityAwareVector.java ## @@ -0,0 +1,32 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector; + +/** + * Vector that support density aware initial capacity settings. + */ +public interface DensityAwareVector { + /** + * Set value with density + * @param valueCount + * @param density + */ + void setInitialCapacity(int valueCount, double density); + Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385345#comment-16385345 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172062983 ## File path: java/vector/src/main/java/org/apache/arrow/vector/DensityAwareVector.java ## @@ -0,0 +1,32 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector; + +/** + * Vector that support density aware initial capacity settings. + */ +public interface DensityAwareVector { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385344#comment-16385344 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172062968 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -174,14 +174,16 @@ public void setInitialCapacity(int valueCount) { * @param valueCount desired number of elements in the vector * @param density average number of bytes per variable width element */ + @Override public void setInitialCapacity(int valueCount, double density) { -final long size = (long) (valueCount * density); -if (size < 1) { - throw new IllegalArgumentException("With the provided density and value count, potential capacity of the data buffer is 0"); -} +long size = (long) (valueCount * density); if (size > MAX_ALLOCATION_SIZE) { throw new OversizedAllocationException("Requested amount of memory is more than max allowed"); } + +if(size == 0) { Review comment: Yes we cannot have an initial capacity of 0 because then our safe* functions runs into an infinite loop where they try to realloc and have the target buffer size as next power of 2 -- BaseAllocator.nextPowerOfTwo returns 0 for 0 and thus safe functions keep calling realloc. This happens if the initial capacity was 0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2199) [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is never less than 1 and propagate density throughout the vector tree
[ https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385343#comment-16385343 ] ASF GitHub Bot commented on ARROW-2199: --- siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] Control the memory allocated for inner vectors in containers. URL: https://github.com/apache/arrow/pull/1646#discussion_r172062884 ## File path: java/vector/src/main/java/org/apache/arrow/vector/DensityAwareVector.java ## @@ -0,0 +1,32 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector; + +/** + * Vector that support density aware initial capacity settings. Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is > never less than 1 and propagate density throughout the vector tree > --- > > Key: ARROW-2199 > URL: https://issues.apache.org/jira/browse/ARROW-2199 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385337#comment-16385337 ] Wes McKinney commented on ARROW-2254: - Upgraded to 2.14.2 and seems still present > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385335#comment-16385335 ] Wes McKinney commented on ARROW-2254: - Ubuntu 14.04, git 2.12.2. If upgrading git solves the issue, we can close this > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
[ https://issues.apache.org/jira/browse/ARROW-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385334#comment-16385334 ] Uwe L. Korn commented on ARROW-2254: On which OS is this and which version of git are you using? > [Python] Local in-place dev versions picking up JS tags > --- > > Key: ARROW-2254 > URL: https://issues.apache.org/jira/browse/ARROW-2254 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > I thought we had fixed this bug, but it's back: > {code} > $ ipython > Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) > Type 'copyright', 'credits' or 'license' for more information > IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. > In [1]: pa.__version__ > Out[1]: '0.3.1.dev52+g8b1c8118' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2228) [Python] Unsigned int type for arrow Table not supported
[ https://issues.apache.org/jira/browse/ARROW-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2228. - Resolution: Duplicate Fix Version/s: 0.9.0 Can confirmed this is fixed in master, will be part of 0.9.0 release > [Python] Unsigned int type for arrow Table not supported > > > Key: ARROW-2228 > URL: https://issues.apache.org/jira/browse/ARROW-2228 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Ubuntu 16.04 > python3.6.3 >Reporter: Marcello >Assignee: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > Running this python one-liner > > {code:java} > // code pa.Table.from_pandas(pd.DataFrame({'foo': > [np.array([1000], dtype=np.uint64)]})) > {code} > I get > {code:java} > // code > --- > ArrowInvalid Traceback (most recent call last) > in () > > 1 pa.Table.from_pandas(pd.DataFrame({'foo': > [np.array([1000], dtype=np.uint64)]})) > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table.from_pandas > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:44927)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in dataframe_to_arrays(df, schema, preserve_index, nthreads) > 348 arrays = [convert_column(c, t) > 349 for c, t in zip(columns_to_convert, > --> 350 convert_types)] > 351 else: > 352 from concurrent import futures > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in (.0) > 347 if nthreads == 1: > 348 arrays = [convert_column(c, t) > --> 349 for c, t in zip(columns_to_convert, > 350 convert_types)] > 351 else: > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in convert_column(col, ty) > 343 > 344 def convert_column(col, ty): > --> 345 return pa.array(col, from_pandas=True, type=ty) > 346 > 347 if nthreads == 1: > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/array.pxi in > pyarrow.lib.array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:29224)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/array.pxi in > pyarrow.lib._ndarray_to_array > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:28465)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8270)() > ArrowInvalid: trying to convert NumPy type int64 but got uint64 > {code} > > the problem possibly relies on the fact that from_pandas doesn't handle the > conversion from python object to unsigned integer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2254) [Python] Local in-place dev versions picking up JS tags
Wes McKinney created ARROW-2254: --- Summary: [Python] Local in-place dev versions picking up JS tags Key: ARROW-2254 URL: https://issues.apache.org/jira/browse/ARROW-2254 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Wes McKinney Fix For: 0.9.0 I thought we had fixed this bug, but it's back: {code} $ ipython Python 3.5.2 | packaged by conda-forge | (default, Jul 26 2016, 01:32:08) Type 'copyright', 'credits' or 'license' for more information IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: pa.__version__ Out[1]: '0.3.1.dev52+g8b1c8118' {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2228) [Python] Unsigned int type for arrow Table not supported
[ https://issues.apache.org/jira/browse/ARROW-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2228: --- Assignee: Wes McKinney > [Python] Unsigned int type for arrow Table not supported > > > Key: ARROW-2228 > URL: https://issues.apache.org/jira/browse/ARROW-2228 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Ubuntu 16.04 > python3.6.3 >Reporter: Marcello >Assignee: Wes McKinney >Priority: Major > > Running this python one-liner > > {code:java} > // code pa.Table.from_pandas(pd.DataFrame({'foo': > [np.array([1000], dtype=np.uint64)]})) > {code} > I get > {code:java} > // code > --- > ArrowInvalid Traceback (most recent call last) > in () > > 1 pa.Table.from_pandas(pd.DataFrame({'foo': > [np.array([1000], dtype=np.uint64)]})) > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table.from_pandas > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:44927)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in dataframe_to_arrays(df, schema, preserve_index, nthreads) > 348 arrays = [convert_column(c, t) > 349 for c, t in zip(columns_to_convert, > --> 350 convert_types)] > 351 else: > 352 from concurrent import futures > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in (.0) > 347 if nthreads == 1: > 348 arrays = [convert_column(c, t) > --> 349 for c, t in zip(columns_to_convert, > 350 convert_types)] > 351 else: > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/pandas_compat.py > in convert_column(col, ty) > 343 > 344 def convert_column(col, ty): > --> 345 return pa.array(col, from_pandas=True, type=ty) > 346 > 347 if nthreads == 1: > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/array.pxi in > pyarrow.lib.array (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:29224)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/array.pxi in > pyarrow.lib._ndarray_to_array > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:28465)() > ~/.virtualenvs/log-archive/lib/python3.6/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status > (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8270)() > ArrowInvalid: trying to convert NumPy type int64 but got uint64 > {code} > > the problem possibly relies on the fact that from_pandas doesn't handle the > conversion from python object to unsigned integer. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2245) [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
[ https://issues.apache.org/jira/browse/ARROW-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2245. - Resolution: Fixed Issue resolved by pull request 1692 [https://github.com/apache/arrow/pull/1692] > [Python] Revert static linkage of parquet-cpp in manylinux1 wheel > - > > Key: ARROW-2245 > URL: https://issues.apache.org/jira/browse/ARROW-2245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Although we are not in a theoretical way the authoritative source of > parquet-cpp with the pyarrow manylinux1 wheel, in practical way we actually > are this and statically linking parquet-cpp can introduce some problems that > dynamically linking it does not (e.g. duplicate unloading of the library if > you include it in a Python wheel and in the process that creates the Python > interpreter). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2245) [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
[ https://issues.apache.org/jira/browse/ARROW-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385315#comment-16385315 ] ASF GitHub Bot commented on ARROW-2245: --- wesm closed pull request #1692: ARROW-2245: ARROW-2246: [Python] Revert static linkage of parquet-cpp in manylinux1 wheel URL: https://github.com/apache/arrow/pull/1692 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 4103af41b..c330e2ae3 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -501,10 +501,11 @@ if (ARROW_JEMALLOC) set(JEMALLOC_SHARED_LIB "${JEMALLOC_PREFIX}/lib/libjemalloc${CMAKE_SHARED_LIBRARY_SUFFIX}") set(JEMALLOC_STATIC_LIB "${JEMALLOC_PREFIX}/lib/libjemalloc_pic${CMAKE_STATIC_LIBRARY_SUFFIX}") set(JEMALLOC_VENDORED 1) + # We need to disable TLS or otherwise C++ exceptions won't work anymore. ExternalProject_Add(jemalloc_ep URL ${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/jemalloc/${JEMALLOC_VERSION}.tar.gz PATCH_COMMAND touch doc/jemalloc.3 doc/jemalloc.html -CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" +CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" "--disable-tls" ${EP_LOG_OPTIONS} BUILD_IN_SOURCE 1 BUILD_COMMAND ${MAKE} diff --git a/cpp/src/plasma/CMakeLists.txt b/cpp/src/plasma/CMakeLists.txt index 3448d009c..bc00f9806 100644 --- a/cpp/src/plasma/CMakeLists.txt +++ b/cpp/src/plasma/CMakeLists.txt @@ -124,6 +124,16 @@ endif() add_executable(plasma_store store.cc) target_link_libraries(plasma_store plasma_static ${PLASMA_LINK_LIBS}) +if (ARROW_RPATH_ORIGIN) + if (APPLE) +set(_lib_install_rpath "@loader_path") + else() +set(_lib_install_rpath "\$ORIGIN") + endif() + set_target_properties(plasma_store PROPERTIES + INSTALL_RPATH ${_lib_install_rpath}) +endif() + # Headers: top level install(FILES common.h diff --git a/python/CMakeLists.txt b/python/CMakeLists.txt index e9de08ba1..72294d494 100644 --- a/python/CMakeLists.txt +++ b/python/CMakeLists.txt @@ -66,7 +66,7 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}") ON) option(PYARROW_BOOST_USE_SHARED "Rely on boost shared libraries on linking static parquet" -OFF) +ON) option(PYARROW_BUILD_PLASMA "Build the PyArrow Plasma integration" OFF) @@ -235,6 +235,24 @@ function(bundle_arrow_implib library_path) COPYONLY) endfunction(bundle_arrow_implib) +function(bundle_boost_lib library_path) + get_filename_component(LIBRARY_NAME ${${library_path}} NAME) + get_filename_component(LIBRARY_NAME_WE ${${library_path}} NAME_WE) + configure_file(${${library_path}} + ${BUILD_OUTPUT_ROOT_DIRECTORY}/${LIBRARY_NAME} + COPYONLY) + set(Boost_SO_VERSION "${Boost_MAJOR_VERSION}.${Boost_MINOR_VERSION}.${Boost_SUBMINOR_VERSION}") + if (APPLE) +configure_file(${${library_path}} + ${BUILD_OUTPUT_ROOT_DIRECTORY}/${LIBRARY_NAME_WE}.${Boost_SO_VERSION}${CMAKE_SHARED_LIBRARY_SUFFIX} +COPYONLY) + else() +configure_file(${${library_path}} + ${BUILD_OUTPUT_ROOT_DIRECTORY}/${LIBRARY_NAME_WE}${CMAKE_SHARED_LIBRARY_SUFFIX}.${Boost_SO_VERSION} +COPYONLY) + endif() +endfunction() + # Always bundle includes file(COPY ${ARROW_INCLUDE_DIR}/arrow DESTINATION ${BUILD_OUTPUT_ROOT_DIRECTORY}/include) @@ -247,6 +265,15 @@ if (PYARROW_BUNDLE_ARROW_CPP) ABI_VERSION ${ARROW_ABI_VERSION} SO_VERSION ${ARROW_SO_VERSION}) + # boost + if (PYARROW_BOOST_USE_SHARED) +set(Boost_USE_STATIC_LIBS OFF) +find_package(Boost COMPONENTS system filesystem regex REQUIRED) +bundle_boost_lib(Boost_REGEX_LIBRARY) +bundle_boost_lib(Boost_FILESYSTEM_LIBRARY) +bundle_boost_lib(Boost_SYSTEM_LIBRARY) + endif() + if (MSVC) bundle_arrow_implib(ARROW_SHARED_IMP_LIB) bundle_arrow_implib(ARROW_PYTHON_SHARED_IMP_LIB) diff --git a/python/manylinux1/Dockerfile-x86_64 b/python/manylinux1/Dockerfile-x86_64 index 62a089329..d48bd0d2c 100644 --- a/python/manylinux1/Dockerfile-x86_64 +++ b/python/manylinux1/Dockerfile-x86_64 @@ -14,13 +14,13 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. -FROM quay.io/xhochy/arrow_manylinux1_x86_64_base:ARROW-2212 +FROM quay.io/xhochy/arrow_manylinux1_x86_64_base:ARROW-2245 ADD arrow /arrow WORKDIR /arrow/cpp RUN mkdir bui
[jira] [Commented] (ARROW-2253) [Python] Support __eq__ on scalar values
[ https://issues.apache.org/jira/browse/ARROW-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385294#comment-16385294 ] ASF GitHub Bot commented on ARROW-2253: --- xhochy opened a new pull request #1695: ARROW-2253: [Python] Support __eq__ on scalar values URL: https://github.com/apache/arrow/pull/1695 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Support __eq__ on scalar values > > > Key: ARROW-2253 > URL: https://issues.apache.org/jira/browse/ARROW-2253 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Support a generic {{__eq__}} method the {{ArrayValue}} class. We might want > to specialise it in the future in C++ to avoid some copies but as a first > attempt delegate the comparison to the Python types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2253) [Python] Support __eq__ on scalar values
[ https://issues.apache.org/jira/browse/ARROW-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2253: -- Labels: pull-request-available (was: ) > [Python] Support __eq__ on scalar values > > > Key: ARROW-2253 > URL: https://issues.apache.org/jira/browse/ARROW-2253 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Support a generic {{__eq__}} method the {{ArrayValue}} class. We might want > to specialise it in the future in C++ to avoid some copies but as a first > attempt delegate the comparison to the Python types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2253) [Python] Support __eq__ on scalar values
Uwe L. Korn created ARROW-2253: -- Summary: [Python] Support __eq__ on scalar values Key: ARROW-2253 URL: https://issues.apache.org/jira/browse/ARROW-2253 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Uwe L. Korn Assignee: Uwe L. Korn Fix For: 0.9.0 Support a generic {{__eq__}} method the {{ArrayValue}} class. We might want to specialise it in the future in C++ to avoid some copies but as a first attempt delegate the comparison to the Python types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2241) [Python] Simple script for running all current ASV benchmarks at a commit or tag
[ https://issues.apache.org/jira/browse/ARROW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2241: Fix Version/s: (was: 0.9.0) 0.10.0 > [Python] Simple script for running all current ASV benchmarks at a commit or > tag > > > Key: ARROW-2241 > URL: https://issues.apache.org/jira/browse/ARROW-2241 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > The objective of this is to be able to get a graph for performance at each > release tag for the currently-defined benchmarks (including benchmarks that > did not exist in older tags) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds
[ https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2248: Fix Version/s: (was: 0.9.0) 0.10.0 > [Python] Nightly or on-demand HDFS test builds > -- > > Key: ARROW-2248 > URL: https://issues.apache.org/jira/browse/ARROW-2248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > We continue to acquire more functionality related to HDFS and Parquet. > Testing this, including tests that involve interoperability with other > systems, like Spark, will require some work outside of our normal CI > infrastructure. > I suggest we start with testing the C++/Python HDFS integration, which will > help with validating patches like ARROW-1643 > https://github.com/apache/arrow/pull/1668 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2227) [Python] Table.from_pandas does not create chunked_arrays.
[ https://issues.apache.org/jira/browse/ARROW-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2227: --- Assignee: Wes McKinney > [Python] Table.from_pandas does not create chunked_arrays. > -- > > Key: ARROW-2227 > URL: https://issues.apache.org/jira/browse/ARROW-2227 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Chris Ellison >Assignee: Wes McKinney >Priority: Major > Fix For: 0.9.0 > > > When creating a large enough array, pyarrow raises an exception: > {code:java} > import numpy as np > import pandas as pd > import pyarrow as pa > x = list('1' * 2**31) > y = pd.DataFrame({'x': x}) > t = pa.Table.from_pandas(y) > # ArrowInvalid: BinaryArrow cannot contain more than 2147483646 bytes, have > 2147483647{code} > The array should be chunked for the user. As is, data frames with >2 GiB in > binary data will struggle to get into arrow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1919) Plasma hanging if object id is not 20 bytes
[ https://issues.apache.org/jira/browse/ARROW-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385279#comment-16385279 ] Wes McKinney commented on ARROW-1919: - Arrow 0.9.0 should be released sometime this month > Plasma hanging if object id is not 20 bytes > --- > > Key: ARROW-1919 > URL: https://issues.apache.org/jira/browse/ARROW-1919 > Project: Apache Arrow > Issue Type: Bug >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > This happens if plasma's capability to put an object with a user defined > object id is used if the object id is not 20 bytes long. Plasma will hang > upon get in that case, we should give an error instead. > See https://github.com/ray-project/ray/issues/1315 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2252) [Python] Create buffer from address, size and base
[ https://issues.apache.org/jira/browse/ARROW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-2252: -- Assignee: Uwe L. Korn > [Python] Create buffer from address, size and base > -- > > Key: ARROW-2252 > URL: https://issues.apache.org/jira/browse/ARROW-2252 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Given a memory address and a size, we should be able to construct an Arrow > buffer from this. The additional base object will be used to hold a reference > to the underlying, original buffer so that it does not go out of scope before > the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2252) [Python] Create buffer from address, size and base
[ https://issues.apache.org/jira/browse/ARROW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2252: -- Labels: pull-request-available (was: ) > [Python] Create buffer from address, size and base > -- > > Key: ARROW-2252 > URL: https://issues.apache.org/jira/browse/ARROW-2252 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Given a memory address and a size, we should be able to construct an Arrow > buffer from this. The additional base object will be used to hold a reference > to the underlying, original buffer so that it does not go out of scope before > the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2252) [Python] Create buffer from address, size and base
[ https://issues.apache.org/jira/browse/ARROW-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385205#comment-16385205 ] ASF GitHub Bot commented on ARROW-2252: --- xhochy opened a new pull request #1693: ARROW-2252: [Python] Create buffer from address, size and base URL: https://github.com/apache/arrow/pull/1693 Usage with Arrow Java vectors: ``` import jpype import numpy as np import pyarrow as pa import sys # Start JVM with Arrow and all of its dependencies. jpype.startJVM(getDefaultJVMPath(), "-Djava.class.path=arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar") # Create vector ra = jpype.JPackage("org").apache.arrow.memory.RootAllocator(sys.maxsize) uint1 = jpype.JPackage("org").apache.arrow.vector.UInt1Vector("int", ra) uint1.allocateNew(128) for i in range(128): uint1.setSafe(i, i) uint1.setValueCount(128) # Access it in Python addr = uint1.getDataBuffer().unwrap().memoryAddress() size = uint1.getDataBuffer().unwrap().capacity() fb = pa.ForeignBuffer(addr, size, n) np.asarray(fb) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Create buffer from address, size and base > -- > > Key: ARROW-2252 > URL: https://issues.apache.org/jira/browse/ARROW-2252 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Given a memory address and a size, we should be able to construct an Arrow > buffer from this. The additional base object will be used to hold a reference > to the underlying, original buffer so that it does not go out of scope before > the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration
[ https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385197#comment-16385197 ] ASF GitHub Bot commented on ARROW-2238: --- MaxRis commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in cmake configuration URL: https://github.com/apache/arrow/pull/1684#issuecomment-370238637 @pitrou it should be fine to set `set(CMAKE_CXX_COMPILER ${CLCACHE_FOUND})` only if Generator defined as `Ninja` or `NMake Makefiles`. This possibly also will resolve current Appveyor failure. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Detect clcache in cmake configuration > --- > > Key: ARROW-2238 > URL: https://issues.apache.org/jira/browse/ARROW-2238 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > By default Windows builds should use clcache if installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2252) [Python] Create buffer from address, size and base
Uwe L. Korn created ARROW-2252: -- Summary: [Python] Create buffer from address, size and base Key: ARROW-2252 URL: https://issues.apache.org/jira/browse/ARROW-2252 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Uwe L. Korn Fix For: 0.9.0 Given a memory address and a size, we should be able to construct an Arrow buffer from this. The additional base object will be used to hold a reference to the underlying, original buffer so that it does not go out of scope before the Arrow buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2245) [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
[ https://issues.apache.org/jira/browse/ARROW-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2245: -- Labels: pull-request-available (was: ) > [Python] Revert static linkage of parquet-cpp in manylinux1 wheel > - > > Key: ARROW-2245 > URL: https://issues.apache.org/jira/browse/ARROW-2245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Although we are not in a theoretical way the authoritative source of > parquet-cpp with the pyarrow manylinux1 wheel, in practical way we actually > are this and statically linking parquet-cpp can introduce some problems that > dynamically linking it does not (e.g. duplicate unloading of the library if > you include it in a Python wheel and in the process that creates the Python > interpreter). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2245) [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
[ https://issues.apache.org/jira/browse/ARROW-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385170#comment-16385170 ] ASF GitHub Bot commented on ARROW-2245: --- xhochy opened a new pull request #1692: ARROW-2245: ARROW-2246: [Python] Revert static linkage of parquet-cpp in manylinux1 wheel URL: https://github.com/apache/arrow/pull/1692 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Revert static linkage of parquet-cpp in manylinux1 wheel > - > > Key: ARROW-2245 > URL: https://issues.apache.org/jira/browse/ARROW-2245 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Although we are not in a theoretical way the authoritative source of > parquet-cpp with the pyarrow manylinux1 wheel, in practical way we actually > are this and statically linking parquet-cpp can introduce some problems that > dynamically linking it does not (e.g. duplicate unloading of the library if > you include it in a Python wheel and in the process that creates the Python > interpreter). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration
[ https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385169#comment-16385169 ] ASF GitHub Bot commented on ARROW-2238: --- pitrou commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in cmake configuration URL: https://github.com/apache/arrow/pull/1684#issuecomment-370234208 clcache works best with Ninja or NMake (*). My suggestion here would be to recommend Ninja + clcache for best build performance. The other concern, though, is to avoid breaking existing builds for those who prefer other generators (e.g. Visual Studio). (*) See the following links: - https://github.com/frerich/clcache/issues/273#issuecomment-354623452 - https://github.com/frerich/clcache/wiki/Integration#integration-for-visual-studio This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Detect clcache in cmake configuration > --- > > Key: ARROW-2238 > URL: https://issues.apache.org/jira/browse/ARROW-2238 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > By default Windows builds should use clcache if installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration
[ https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385116#comment-16385116 ] ASF GitHub Bot commented on ARROW-2238: --- MaxRis commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in cmake configuration URL: https://github.com/apache/arrow/pull/1684#issuecomment-370228503 update: it seems that current solution `set(CMAKE_CXX_COMPILER ${CLCACHE_FOUND})` works only with `NMake Makefiles` generator, but clcache doesn't get called if `Visual Studio 14 2015 Win64` or similar is used. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Detect clcache in cmake configuration > --- > > Key: ARROW-2238 > URL: https://issues.apache.org/jira/browse/ARROW-2238 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > By default Windows builds should use clcache if installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2251: -- Labels: pull-request-available (was: ) > [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes > a crash > - > > Key: ARROW-2251 > URL: https://issues.apache.org/jira/browse/ARROW-2251 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.8.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
[ https://issues.apache.org/jira/browse/ARROW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385107#comment-16385107 ] ASF GitHub Bot commented on ARROW-2251: --- kou opened a new pull request #1691: ARROW-2251: [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live URL: https://github.com/apache/arrow/pull/1691 It prevents GC-related crash. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes > a crash > - > > Key: ARROW-2251 > URL: https://issues.apache.org/jira/browse/ARROW-2251 > Project: Apache Arrow > Issue Type: Bug > Components: GLib >Affects Versions: 0.8.0 >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2251) [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash
Kouhei Sutou created ARROW-2251: --- Summary: [GLib] Destroying GArrowBuffer while GArrowTensor that uses the buffer causes a crash Key: ARROW-2251 URL: https://issues.apache.org/jira/browse/ARROW-2251 Project: Apache Arrow Issue Type: Bug Components: GLib Affects Versions: 0.8.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.9.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)