[jira] [Created] (ARROW-8386) [Python] pyarrow.jvm raises error for empty Arrays
Bryan Cutler created ARROW-8386: --- Summary: [Python] pyarrow.jvm raises error for empty Arrays Key: ARROW-8386 URL: https://issues.apache.org/jira/browse/ARROW-8386 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.16.0 Reporter: Bryan Cutler Assignee: Bryan Cutler In the pyarrow.jvm module, when there is an empty array in Java, trying to create it in python raises a ValueError. This is because for an empty array, Java returns an empty list of buffers, then pyarrow.jvm attempts to create the array with pa.Array.from_buffers with an empty list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7966) [Integration][Flight][C++] Client should verify each batch independently
Bryan Cutler created ARROW-7966: --- Summary: [Integration][Flight][C++] Client should verify each batch independently Key: ARROW-7966 URL: https://issues.apache.org/jira/browse/ARROW-7966 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Bryan Cutler Currently the C++ Flight test client in {{test_integration_client.cc}} reads all batches from JSON into a Table, reads all batches in the flight stream from the server into a Table, then compares the Tables for equality. This is potentially a problem because a record batch might have specific information that is then lost in the conversion to a Table. For example, if the server sends empty batches, the resulting Table would not be different from one with no empty batches. Instead, the client should check each record batch from the JSON file against each record batch from the server independently. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7933) [Java][Flight][Tests] Add roundtrip tests for Java Flight Test Client
Bryan Cutler created ARROW-7933: --- Summary: [Java][Flight][Tests] Add roundtrip tests for Java Flight Test Client Key: ARROW-7933 URL: https://issues.apache.org/jira/browse/ARROW-7933 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Reporter: Bryan Cutler There should be some built-in roundtrip tests for Java Flight IntegrationTestClient -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7770) [Release] Archery does not use correct integration test args
Bryan Cutler created ARROW-7770: --- Summary: [Release] Archery does not use correct integration test args Key: ARROW-7770 URL: https://issues.apache.org/jira/browse/ARROW-7770 Project: Apache Arrow Issue Type: Bug Components: Archery Reporter: Bryan Cutler Assignee: Bryan Cutler When using release verification script and selecting integration tests, Archery ignores selected tests and runs all tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7723) [Python] StructArray timestamp type with timezone to_pandas convert error
Bryan Cutler created ARROW-7723: --- Summary: [Python] StructArray timestamp type with timezone to_pandas convert error Key: ARROW-7723 URL: https://issues.apache.org/jira/browse/ARROW-7723 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler When a {{StructArray}} has a child that is a timestamp with a timezone, the {{to_pandas}} conversion outputs an int64 instead of a timestamp {code:java} In [1]: import pyarrow as pa ...: import pandas as pd ...: arr = pa.array([{'start': pd.Timestamp.now(), 'end': pd.Timestamp.now()}]) ...: In [2]: arr.to_pandas() Out[2]: 0{'end': 2020-01-29 11:38:02.792681, 'start': 2... dtype: object In [3]: ts = pd.Timestamp.now() In [4]: arr2 = pa.array([ts], type=pa.timestamp('us', tz='America/New_York')) In [5]: arr2.to_pandas() Out[5]: 0 2020-01-29 06:38:47.848944-05:00 dtype: datetime64[ns, America/New_York] In [6]: arr = pa.StructArray.from_arrays([arr2, arr2], ['start', 'stop']) In [7]: arr.to_pandas() Out[7]: 0{'start': 1580297927848944000, 'stop': 1580297... dtype: object {code} from https://github.com/apache/arrow/pull/6312 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7709) [Python] Conversion from Table Column to Pandas loses name for Timestamps
Bryan Cutler created ARROW-7709: --- Summary: [Python] Conversion from Table Column to Pandas loses name for Timestamps Key: ARROW-7709 URL: https://issues.apache.org/jira/browse/ARROW-7709 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler When converting a Table timestamp column to Pandas, the name of the column is lost in the resulting series. {code:java} In [23]: a1 = pa.array([pd.Timestamp.now()]) In [24]: a2 = pa.array([1]) In [25]: t = pa.Table.from_arrays([a1, a2], ['ts', 'a']) In [26]: for c in t: ...: print(c.to_pandas()) ...: 0 2020-01-28 13:17:26.738708 dtype: datetime64[ns] 01 Name: a, dtype: int64 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7693) [CI] Fix test-conda-python-3.7-spark-master nightly errors
Bryan Cutler created ARROW-7693: --- Summary: [CI] Fix test-conda-python-3.7-spark-master nightly errors Key: ARROW-7693 URL: https://issues.apache.org/jira/browse/ARROW-7693 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Reporter: Bryan Cutler Assignee: Bryan Cutler Spark master renamed some tests, need to update -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7502) [Integration] Remove Spark Integration patch that not needed anymore
Bryan Cutler created ARROW-7502: --- Summary: [Integration] Remove Spark Integration patch that not needed anymore Key: ARROW-7502 URL: https://issues.apache.org/jira/browse/ARROW-7502 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Bryan Cutler Assignee: Bryan Cutler Apache Spark master has been updated to work with Arrow 0.15.1 after the binary protocol change and patching Spark master is no longer necessary to build with current Arrow, so the previous patch can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7223) [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true
Bryan Cutler created ARROW-7223: --- Summary: [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true Key: ARROW-7223 URL: https://issues.apache.org/jira/browse/ARROW-7223 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler After ARROW-3191, consumers of Arrow Java with a JDK 9 and above are required to set the JVM property "io.netty.tryReflectionSetAccessible=true" at startup, each time Arrow code is run, as documented at https://github.com/apache/arrow/tree/master/java#java-properties. Not doing this will result in the error "java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available", making Arrow unusable out-of-the-box. This proposes to automatically set the property if not already set in the following steps: 1) check to see if the property io.netty.tryReflectionSetAccessible has been set 2) if not set, automatically set to "true" 3) else if set to "false", catch the Netty error and prepend the error message with the suggested setting of "true" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7173) Add test to verify Map field names can be arbitrary
Bryan Cutler created ARROW-7173: --- Summary: Add test to verify Map field names can be arbitrary Key: ARROW-7173 URL: https://issues.apache.org/jira/browse/ARROW-7173 Project: Apache Arrow Issue Type: Test Components: Integration Reporter: Bryan Cutler A Map has child fields and the format spec only recommends that they be named "entries", "key", and "value" but could be named anything. Currently, integration tests for Map arrays verify the exchanged schema is equal, so the child fields are always named the same. There should be tests that use different names to verify implementations can accept this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6904) [Python] Implement MapArray and MapType
Bryan Cutler created ARROW-6904: --- Summary: [Python] Implement MapArray and MapType Key: ARROW-6904 URL: https://issues.apache.org/jira/browse/ARROW-6904 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 1.0.0 Map arrays are already added to C++, need to expose them in the Python API also -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6790) [Release] Automatically disable integration test cases in release verification
Bryan Cutler created ARROW-6790: --- Summary: [Release] Automatically disable integration test cases in release verification Key: ARROW-6790 URL: https://issues.apache.org/jira/browse/ARROW-6790 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Bryan Cutler Assignee: Bryan Cutler If dev/release/verify-release-candidate.sh is run with selective testing and includes integration tests, the selected implementations should be the only ones enabled when running the integration test portion. For example: TEST_DEFAULT=0 \ TEST_CPP=1 \ TEST_JAVA=1 \ TEST_INTEGRATION=1 \ dev/release/verify-release-candidate.sh source 0.15.0 2 Should run integration only for C++ and Java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6652) [Python] to_pandas conversion removes timezone from type
Bryan Cutler created ARROW-6652: --- Summary: [Python] to_pandas conversion removes timezone from type Key: ARROW-6652 URL: https://issues.apache.org/jira/browse/ARROW-6652 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler Fix For: 0.15.0 Calling {{to_pandas}} on a {{pyarrow.Array}} with a timezone aware timestamp type, removes the timezone in the resulting {{pandas.Series}}. {code} >>> import pyarrow as pa >>> a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles')) >>> a.to_pandas() 0 1970-01-01 00:00:00.01 dtype: datetime64[ns] {code} Previous behavior from 0.14.1 of converting a {{pyarrow.Column}} {{to_pandas}} retained the timezone. {code} In [4]: import pyarrow as pa ...: a = pa.array([1], type=pa.timestamp('us', tz='America/Los_Angeles')) ...: c = pa.Column.from_array('ts', a) In [5]: c.to_pandas() Out[5]: 0 1969-12-31 16:00:00.01-08:00 Name: ts, dtype: datetime64[ns, America/Los_Angeles] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6534) [Java] Fix typos and spelling
Bryan Cutler created ARROW-6534: --- Summary: [Java] Fix typos and spelling Key: ARROW-6534 URL: https://issues.apache.org/jira/browse/ARROW-6534 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 0.15.0 Fix typos and spelling, mostly in docs and tests. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6519) [Java] Use IPC continuation token to mark EOS
Bryan Cutler created ARROW-6519: --- Summary: [Java] Use IPC continuation token to mark EOS Key: ARROW-6519 URL: https://issues.apache.org/jira/browse/ARROW-6519 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 0.15.0 For Arrow stream in non-legacy mode, the EOS identifier should be \{0x, 0x}. This way, all bytes sent by the writer can be read. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6461) [Java] EchoServer can close socket before client has finished reading
Bryan Cutler created ARROW-6461: --- Summary: [Java] EchoServer can close socket before client has finished reading Key: ARROW-6461 URL: https://issues.apache.org/jira/browse/ARROW-6461 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Bryan Cutler Fix For: 0.15.0 When the EchoServer finishes running the client connection, the socket is closed immediately. This causes a race condition and the client will fail with a {noformat} SocketException: connection reset {noformat} if it has not read all of the echoed batches. This was consistently happening with the fix for ARROW-6315 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6215) [Java] RangeEqualVisitor does not properly compare ZeroVector
Bryan Cutler created ARROW-6215: --- Summary: [Java] RangeEqualVisitor does not properly compare ZeroVector Key: ARROW-6215 URL: https://issues.apache.org/jira/browse/ARROW-6215 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler ZeroVector.accept and RangeEqualVisitor always return True no matter what type of other vector is compared -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5762) [Integration][JS] Integration Tests for MapType
Bryan Cutler created ARROW-5762: --- Summary: [Integration][JS] Integration Tests for MapType Key: ARROW-5762 URL: https://issues.apache.org/jira/browse/ARROW-5762 Project: Apache Arrow Issue Type: Improvement Components: Integration, JavaScript Reporter: Bryan Cutler ARROW-1279 enabled integration tests for MapType between Java and C++, but JavaScript had to be disabled for the map case due to an error. Once this is fixed, {{generate_map_case}} could be moved under {{generate_nested_case}} with the other nested types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5063) [Java] FlightClient should not create a child allocator
Bryan Cutler created ARROW-5063: --- Summary: [Java] FlightClient should not create a child allocator Key: ARROW-5063 URL: https://issues.apache.org/jira/browse/ARROW-5063 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler I ran into a problem when testing out Flight using the ExampleFlightServer with InMemoryStore producer. A client will iterate over endpoints and locations to get the streams, and the example creates a new client for each location. The only way to close the allocator in the FlightClient is to close the FlightClient, which also closes the read channel. If the location is the same for each FlightStream (as is the case for the InMemoryStore), then it seems like grpc will reuse the channel, so closing one read client will shutdown the channel and the remaining FlightStreams cannot be read. If an allocator was created by the owner of the FlightClient, then the client would not need to close it and this problem would be avoided. I believe other Flight classes do not create child allocators either, so this change would be consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5062) Shade Java Guava dependency for Flight
Bryan Cutler created ARROW-5062: --- Summary: Shade Java Guava dependency for Flight Key: ARROW-5062 URL: https://issues.apache.org/jira/browse/ARROW-5062 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler The Guava dependency in the Java Flight module can interfere if using Flight in an application that relies on an older version of Guava. We can shade the usage in Flight to prevent this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5014) [Java] Fix typos in Flight module
Bryan Cutler created ARROW-5014: --- Summary: [Java] Fix typos in Flight module Key: ARROW-5014 URL: https://issues.apache.org/jira/browse/ARROW-5014 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4344) [Java] Further cleanup maven output
Bryan Cutler created ARROW-4344: --- Summary: [Java] Further cleanup maven output Key: ARROW-4344 URL: https://issues.apache.org/jira/browse/ARROW-4344 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Followup to ARROW-4180, I noticed a EchoServer logs info output that should be changed to debug. Also, upgrading the rat license check plugin will not output all files excluded, which ends up to be a large amount of output as it is done for every module. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3588) [Java] checkstyle - fix license
Bryan Cutler created ARROW-3588: --- Summary: [Java] checkstyle - fix license Key: ARROW-3588 URL: https://issues.apache.org/jira/browse/ARROW-3588 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Make header correspond to the defined Apache license in checkstyle.license -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3323) [Java] checkstyle - fix naming
Bryan Cutler created ARROW-3323: --- Summary: [Java] checkstyle - fix naming Key: ARROW-3323 URL: https://issues.apache.org/jira/browse/ARROW-3323 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler enable naming rules -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3273) [Java] checkstyle - fix javadoc style
Bryan Cutler created ARROW-3273: --- Summary: [Java] checkstyle - fix javadoc style Key: ARROW-3273 URL: https://issues.apache.org/jira/browse/ARROW-3273 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3272) [Java] Document deviations from Google Style
Bryan Cutler created ARROW-3272: --- Summary: [Java] Document deviations from Google Style Key: ARROW-3272 URL: https://issues.apache.org/jira/browse/ARROW-3272 Project: Apache Arrow Issue Type: Sub-task Reporter: Bryan Cutler -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3264) [Java] checkstyle - fix whitespace
Bryan Cutler created ARROW-3264: --- Summary: [Java] checkstyle - fix whitespace Key: ARROW-3264 URL: https://issues.apache.org/jira/browse/ARROW-3264 Project: Apache Arrow Issue Type: Sub-task Reporter: Bryan Cutler Assignee: Bryan Cutler Fix remaining whitespace issues -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3171) [Java] checkstyle - fix line length and whitespace
Bryan Cutler created ARROW-3171: --- Summary: [Java] checkstyle - fix line length and whitespace Key: ARROW-3171 URL: https://issues.apache.org/jira/browse/ARROW-3171 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3115) [Java] Style Checks - Fix import ordering
Bryan Cutler created ARROW-3115: --- Summary: [Java] Style Checks - Fix import ordering Key: ARROW-3115 URL: https://issues.apache.org/jira/browse/ARROW-3115 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Fix import ordering according to checkstyle -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3111) [Java] Enable changing default logging level when running tests
Bryan Cutler created ARROW-3111: --- Summary: [Java] Enable changing default logging level when running tests Key: ARROW-3111 URL: https://issues.apache.org/jira/browse/ARROW-3111 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Currently tests use the logback logger which has a default level of DEBUG. We should provide a way to change this level so that CI can run a build without seeing DEBUG messages if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2923) [Doc] Add instructions for running Spark integration tests
Bryan Cutler created ARROW-2923: --- Summary: [Doc] Add instructions for running Spark integration tests Key: ARROW-2923 URL: https://issues.apache.org/jira/browse/ARROW-2923 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Bryan Cutler Assignee: Bryan Cutler Add instructions to dev/README for running Spark integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2914) [Integration] Add WindowPandasUDFTests to Spark Integration
Bryan Cutler created ARROW-2914: --- Summary: [Integration] Add WindowPandasUDFTests to Spark Integration Key: ARROW-2914 URL: https://issues.apache.org/jira/browse/ARROW-2914 Project: Apache Arrow Issue Type: Improvement Components: Integration Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 0.10.0 Add PySpark tests for WindowPandasUDFTests to Spark Integration tests,. Also, run the docker image against current Arrow master with patched version of Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2742) [Python] Allow Table.from_batches to use Iterator of ArrowRecordBatches
Bryan Cutler created ARROW-2742: --- Summary: [Python] Allow Table.from_batches to use Iterator of ArrowRecordBatches Key: ARROW-2742 URL: https://issues.apache.org/jira/browse/ARROW-2742 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Bryan Cutler Assignee: Bryan Cutler Currently, pyarrow.Table.from_batches requires a list of record batches. A simple change will allow this to accept an iterator, which could be useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2704) [Java] IPC stream handling should be more friendly to low level processing
Bryan Cutler created ARROW-2704: --- Summary: [Java] IPC stream handling should be more friendly to low level processing Key: ARROW-2704 URL: https://issues.apache.org/jira/browse/ARROW-2704 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler With some minor adjustments, the Java IPC stream reader could be more friendly to allow for low level message processing. By that I mean reading a stream and examining messages without necessarily having to load the Record Batch data. These include * Separate MessageChannelReader.readNextMessage to allow access to the buffer containing the message. * MessageChannelReader input channel should be protected * ArrowStreamWriter should make the message to end a stream static * WriteChannel intToBytes could write to existing bytes or byte buffer instead of creating new array -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2645) [Java] ArrowStreamWriter accumulates DictionaryBatch ArrowBlocks
Bryan Cutler created ARROW-2645: --- Summary: [Java] ArrowStreamWriter accumulates DictionaryBatch ArrowBlocks Key: ARROW-2645 URL: https://issues.apache.org/jira/browse/ARROW-2645 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler While reading the base method ensureStarted in ArrowStreamWriter accumulates Dictionary blocks. This is used for ArrowFileWriter but not ArrowStreamWriter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2432) [Python] from_pandas fails when converting decimals if contain None
Bryan Cutler created ARROW-2432: --- Summary: [Python] from_pandas fails when converting decimals if contain None Key: ARROW-2432 URL: https://issues.apache.org/jira/browse/ARROW-2432 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Bryan Cutler Using from_pandas to convert decimals fails if encounters a value of {{None}}. For example: {code:java} In [1]: import pyarrow as pa ...: import pandas as pd ...: from decimal import Decimal ...: In [2]: s_dec = pd.Series([Decimal('3.14'), None]) In [3]: pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2)) --- ArrowInvalid Traceback (most recent call last) in () > 1 pa.Array.from_pandas(s_dec, type=pa.decimal128(3, 2)) array.pxi in pyarrow.lib.Array.from_pandas() array.pxi in pyarrow.lib.array() error.pxi in pyarrow.lib.check_status() error.pxi in pyarrow.lib.check_status() ArrowInvalid: Error converting from Python objects to Decimal: Got Python object of type NoneType but can only handle these types: decimal.Decimal In [4]: s_dec Out[4]: 0 3.14 1 None dtype: object{code} The above error is raised when specifying decimal type. When no type is specified, a seg fault happens. This previously worked in 0.8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2380) Correct issues in numpy_to_arrow conversion routines
Bryan Cutler created ARROW-2380: --- Summary: Correct issues in numpy_to_arrow conversion routines Key: ARROW-2380 URL: https://issues.apache.org/jira/browse/ARROW-2380 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Bryan Cutler Fix For: 0.10.0 Following the discussion at [https://github.com/apache/arrow/pull/1689,] there are a few issues with conversion of various types to arrow that are incorrect or could be improved: * PyBytes_GET_SIZE is being casted to the wrong type, for example {{const int32_t length = static_cast(PyBytes_GET_SIZE(obj));}} * Handle the possibility with the statement {{builder->value_data_length() + length > kBinaryMemoryLimit}} if length is larger than kBinaryMemoryLimit * Look into using common code for binary object conversion to avoid duplication, and allow support for bytes and bytearray objects in other places than numpy_to_arrow. (possibly put in src/arrow/python/helpers.h) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2101) [Python] from_pandas reads 'str' types as binary Arrow data with Python 2
Bryan Cutler created ARROW-2101: --- Summary: [Python] from_pandas reads 'str' types as binary Arrow data with Python 2 Key: ARROW-2101 URL: https://issues.apache.org/jira/browse/ARROW-2101 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Reporter: Bryan Cutler Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow data of binary type, even if the user supplies type information. conversion of 'unicode' type works to create Arrow data of string types. For example {code} In [25]: pa.Array.from_pandas(pd.Series(['a'])).type Out[25]: DataType(binary) In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type Out[26]: DataType(binary) In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type Out[27]: DataType(string) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-1962) [Java] Add reset() to ValueVector interface
Bryan Cutler created ARROW-1962: --- Summary: [Java] Add reset() to ValueVector interface Key: ARROW-1962 URL: https://issues.apache.org/jira/browse/ARROW-1962 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler The {{reset()}} method exists in some ValueVectors but not all. Its meaning is that it will bring the vector to an empty state, but not release any buffers (as opposed to clear() which resets and releases buffers). It should be added to the {{ValueVector}} interface and implemented in the vector hierarchy where it currently is not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1948) [Java] ListVector does not handle ipc with all non-null values with none set
Bryan Cutler created ARROW-1948: --- Summary: [Java] ListVector does not handle ipc with all non-null values with none set Key: ARROW-1948 URL: https://issues.apache.org/jira/browse/ARROW-1948 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Affects Versions: 0.8.0 Reporter: Bryan Cutler Assignee: Bryan Cutler It is valid for ipc to send a validity buffer with no values set that indicate all values are non-null. This is already handled by all vectors except ListVector, which will throw an invalid index exception when this is the case because it does not build the validity buffer with all elements set. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1915) [Python] Parquet tests should be optional
Bryan Cutler created ARROW-1915: --- Summary: [Python] Parquet tests should be optional Key: ARROW-1915 URL: https://issues.apache.org/jira/browse/ARROW-1915 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler Assignee: Bryan Cutler Priority: Trivial Two decimal tests in {{test_parquet.py}} are missing the @parquet decorator to allow skipping if parquet is not install, resulting in failure -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1906) [Python] Creating a pyarrow.Array with timestamp of different unit is not casted
Bryan Cutler created ARROW-1906: --- Summary: [Python] Creating a pyarrow.Array with timestamp of different unit is not casted Key: ARROW-1906 URL: https://issues.apache.org/jira/browse/ARROW-1906 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler This is similar to ARROW-1680 but slightly different in that an error is not raised but the unit will still remain unchanged only when using a timezone {noformat} In [47]: us_with_tz = pa.timestamp('us', tz='America/New_York') In [48]: s = pd.Series([val]) In [49]: s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York') In [50]: arr = pa.Array.from_pandas(s_nyc, type=us_with_tz) In [51]: arr.type Out[51]: TimestampType(timestamp[ns, tz=America/New_York]) In [52]: arr2 = pa.Array.from_pandas(s, type=pa.timestamp('us')) In [53]: arr2.type Out[53]: TimestampType(timestamp[us]) {noformat} There is an easy workaround to apply the cast after creating the pyarrow.Array, which seems to work fine {noformat} In [54]: arr = pa.Array.from_pandas(s_nyc).cast(us_with_tz, safe=False) In [55]: arr.type Out[55]: TimestampType(timestamp[us, tz=America/New_York]) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1868) [Java] Change vector getMinorType to use MinorType instead of Types.MinorType
Bryan Cutler created ARROW-1868: --- Summary: [Java] Change vector getMinorType to use MinorType instead of Types.MinorType Key: ARROW-1868 URL: https://issues.apache.org/jira/browse/ARROW-1868 Project: Apache Arrow Issue Type: Sub-task Components: Java - Vectors Reporter: Bryan Cutler This is just some renaming to clean things up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1867) [Java] Add BitVector APIs from old vector class
Bryan Cutler created ARROW-1867: --- Summary: [Java] Add BitVector APIs from old vector class Key: ARROW-1867 URL: https://issues.apache.org/jira/browse/ARROW-1867 Project: Apache Arrow Issue Type: Sub-task Components: Java - Vectors Reporter: Bryan Cutler The new BitVector class after the refactoring does not have some of the APIs from the previous class such as {{setRangeToOnes}}, etc. Also, I believe {{getNullCount}} returned the number of zeros in the vector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1866) [JAVA] Combine MapVector and NonNullableMapVector Classes
Bryan Cutler created ARROW-1866: --- Summary: [JAVA] Combine MapVector and NonNullableMapVector Classes Key: ARROW-1866 URL: https://issues.apache.org/jira/browse/ARROW-1866 Project: Apache Arrow Issue Type: Sub-task Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler {{NonNullableMapVector}} class can be merged into {{MapVector}} and removed as part of removing the non nullable vectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1818) Examine Java Dependencies
Bryan Cutler created ARROW-1818: --- Summary: Examine Java Dependencies Key: ARROW-1818 URL: https://issues.apache.org/jira/browse/ARROW-1818 Project: Apache Arrow Issue Type: Task Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 0.8.0 While integrating the latest Arrow Java with Spark master, I noticed some possible binary incompatibilities with dependencies. I'd like to examine these a little closer and make sure there are no problems before 0.8 is cut. {noformat} Found version conflict(s) in library dependencies; some are suspected to be binary incompatible: [warn] [warn] * com.google.code.findbugs:jsr305:3.0.2 is selected over {1.3.9, 3.0.0} [warn] +- org.apache.arrow:arrow-vector:0.8.0-SNAPSHOT (depends on 3.0.2) [warn] +- org.apache.arrow:arrow-memory:0.8.0-SNAPSHOT (depends on 3.0.2) [warn] +- org.apache.hadoop:hadoop-common:2.7.3 (depends on 3.0.0) [warn] +- org.apache.spark:spark-unsafe_2.11:2.3.0-SNAPSHOT (depends on 1.3.9) [warn] +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 1.3.9) [warn] +- org.apache.spark:spark-network-common_2.11:2.3.0-SNAPSHOT (depends on 1.3.9) [warn] [warn] * io.netty:netty:3.9.9.Final is selected over {3.6.2.Final, 3.7.0.Final} [warn] +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 3.9.9.Final) [warn] +- org.apache.hadoop:hadoop-hdfs:2.7.3(depends on 3.6.2.Final) [warn] +- org.apache.zookeeper:zookeeper:3.4.6 (depends on 3.6.2.Final) [warn] [warn] * io.netty:netty-all:4.0.47.Final is selected over 4.0.23.Final [warn] +- org.apache.hadoop:hadoop-hdfs:2.7.3(depends on 4.0.23.Final) [warn] +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 4.0.23.Final) [warn] +- org.apache.spark:spark-network-common_2.11:2.3.0-SNAPSHOT (depends on 4.0.23.Final) [warn] [warn] * commons-net:commons-net:3.1 is selected over 2.2 [warn] +- org.apache.hadoop:hadoop-common:2.7.3 (depends on 3.1) [warn] +- org.apache.spark:spark-core_2.11:2.3.0-SNAPSHOT(depends on 2.2) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1817) Configure JsonFileReader to read NaN for floats
Bryan Cutler created ARROW-1817: --- Summary: Configure JsonFileReader to read NaN for floats Key: ARROW-1817 URL: https://issues.apache.org/jira/browse/ARROW-1817 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler Fix For: 0.8.0 There is a Spark unit test that includes reading JSON floating point values that are NaN's (validity bit is set). The Jackson parser in Arrow version 0.4 allowed for these by default, but looks like the updated version requires the conf {{ALLOW_NON_NUMERIC_NUMBERS}} to allow this. https://fasterxml.github.io/jackson-core/javadoc/2.2.0/com/fasterxml/jackson/core/JsonParser.Feature.html#ALLOW_NON_NUMERIC_NUMBERS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1718) [Python] Creating a pyarrow.Array of date type from pandas causes error
Bryan Cutler created ARROW-1718: --- Summary: [Python] Creating a pyarrow.Array of date type from pandas causes error Key: ARROW-1718 URL: https://issues.apache.org/jira/browse/ARROW-1718 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler When calling {Array.from_pandas} with a pandas.Series of dates and specifying the desired pyarrow type, an error occurs. If the type is not specified then {from_pandas} will interpret the data as a timestamp type. {code} import pandas as pd import pyarrow as pa import datetime arr = pa.array([datetime.date(2017, 10, 23)]) c = pa.Column.from_array("d", arr) s = c.to_pandas() print(s) # 0 2017-10-23 # Name: d, dtype: datetime64[ns] result = pa.Array.from_pandas(s, type=pa.date32()) print(result) """ Traceback (most recent call last): File "", line 1, in File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) File "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", line 28, in array_format values.append(value_format(x, 0)) File "/home/bryan/.local/lib/python2.7/site-packages/pyarrow-0.7.2.dev21+ng028f2cd-py2.7-linux-x86_64.egg/pyarrow/formatting.py", line 49, in value_format return repr(x) File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) File "pyarrow/scalar.pxi", line 137, in pyarrow.lib.Date32Value.as_py (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:20368) ValueError: year is out of range """ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1680) [Python] Timestamp unit change not done in from_pandas() conversion
Bryan Cutler created ARROW-1680: --- Summary: [Python] Timestamp unit change not done in from_pandas() conversion Key: ARROW-1680 URL: https://issues.apache.org/jira/browse/ARROW-1680 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler When calling {{Array.from_pandas}} with a pandas.Series of timestamps that have 'ns' unit and specifying a type to coerce to with 'us' causes problems. When the series has timestamps with a timezone, the unit is ignored. When the series does not have a timezone, it is applied but causes an OverflowError when printing. {noformat} >>> import pandas as pd >>> import pyarrow as pa >>> from datetime import datetime >>> s = pd.Series([datetime.now()]) >>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York') >>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', >>> tz='America/New_York')) >>> arr.type TimestampType(timestamp[ns, tz=America/New_York]) >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us')) >>> arr.type TimestampType(timestamp[us]) >>> print(arr) Traceback (most recent call last): File "", line 1, in File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) values = array_format(self, window=10) File "pyarrow/formatting.py", line 28, in array_format values.append(value_format(x, 0)) File "pyarrow/formatting.py", line 49, in value_format return repr(x) File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) return repr(self.as_py()) File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600) return converter(value, tzinfo=tzinfo) File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5 (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295) TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp( File "pandas/_libs/tslib.pyx", line 402, in pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051) File "pandas/_libs/tslib.pyx", line 1467, in pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665) OverflowError: Python int too large to convert to C long {noformat} A workaround is to manually change values with astype {noformat} >>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]')) >>> arr.type TimestampType(timestamp[us]) >>> print(arr) [ Timestamp('2017-10-17 11:04:44.308233') ] >>> {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1619) [Java] Correctly set "lastSet" for variable vectors in JsonReader
Bryan Cutler created ARROW-1619: --- Summary: [Java] Correctly set "lastSet" for variable vectors in JsonReader Key: ARROW-1619 URL: https://issues.apache.org/jira/browse/ARROW-1619 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler The Arrow Java JsonFileReader does not correctly set "lastSet" in VariableWidthVectors which makes reading inner vectors overly complicated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1613) [Java] ArrowReader should not close the input ReadChannel
Bryan Cutler created ARROW-1613: --- Summary: [Java] ArrowReader should not close the input ReadChannel Key: ARROW-1613 URL: https://issues.apache.org/jira/browse/ARROW-1613 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler Currently, {{ArrowReader.close()}} will close resources (VectorSchemaRoot and Dictionary Vectors) and also close the input ReadChannel, or InputStream for ArrowStreamReader. Closing of the ReadChannel should be done by what ever created it because it might need to be reused. If this not possible, an alternative could be to add a method {{ArrowReader.end()}} that will close resources but not the ReadChannel. Then {{end()}} could be called instead of {{close()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1361) [Java] Add minor type param accessors to NullableValueVectors
Bryan Cutler created ARROW-1361: --- Summary: [Java] Add minor type param accessors to NullableValueVectors Key: ARROW-1361 URL: https://issues.apache.org/jira/browse/ARROW-1361 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler A {{NullableValueVector}} creates private copies of each param in the minor type, but does not have any way public api to access them. So if given a {{NullableValueVector}} you would have to use the {{Field}} and cast to the correct type. For example, with a {{NullableTimeStampMicroTZVector}} and trying to get the timezone: {noformat} if field.getType.isInstanceOf[ArrowType.Timestamp] && field.getType.asInstanceOf[ArrowType.Timestamp].getTimezone {noformat} It would be more convenient to have direct accessors for these type params. Also, it is possible to do some minor refactoring because {{NullableValueVectors}} does not use these type params, so there is no need to store them. They already exist in the inner vector object and the Field type. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1352) [Integration] Improve print formatting for producer, consumer line
Bryan Cutler created ARROW-1352: --- Summary: [Integration] Improve print formatting for producer, consumer line Key: ARROW-1352 URL: https://issues.apache.org/jira/browse/ARROW-1352 Project: Apache Arrow Issue Type: Improvement Reporter: Bryan Cutler Assignee: Bryan Cutler Priority: Trivial When running integration tests, the line indicating producer/consumer in the output gets jumbled with the rest. This should be different to allow easier visual inspection on the producers/consumers run. Here is some of the current output as it changes producer/consumer {noformat} == Testing file /tmp/tmpso6golfs/generated_dictionary.json == -- Creating binary inputs -- Validating file -- Validating stream -- Java producing, Java consuming == Testing file /home/bryan/git/arrow/integration/data/struct_example.json == -- Creating binary inputs -- Validating file -- Validating stream {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1283) [Java] VectorSchemaRoot should be able to be closed() more than once
Bryan Cutler created ARROW-1283: --- Summary: [Java] VectorSchemaRoot should be able to be closed() more than once Key: ARROW-1283 URL: https://issues.apache.org/jira/browse/ARROW-1283 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler When working with a VectorSchemaRoot, once it is no longer needed it resources are freed by calling {{close()}} followed by then closing the allocator. Sometimes it is needed to close a second time due to complex operations. If the VectorSchemaRoot is closed again after the allocator, it raises an assertion error during {{clear()}} because it is trying to allocate an empty buffer. The {{close()}} operation should mean that the object is no longer to be used, so this empty buffer is not needed and ends up being destroyed immediately anyway. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1268) [Website] Blog post on Arrow integration with Spark
Bryan Cutler created ARROW-1268: --- Summary: [Website] Blog post on Arrow integration with Spark Key: ARROW-1268 URL: https://issues.apache.org/jira/browse/ARROW-1268 Project: Apache Arrow Issue Type: New Feature Components: Website Reporter: Bryan Cutler Assignee: Bryan Cutler Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1245) [Integration] Java Integration Tests Disabled
Bryan Cutler created ARROW-1245: --- Summary: [Integration] Java Integration Tests Disabled Key: ARROW-1245 URL: https://issues.apache.org/jira/browse/ARROW-1245 Project: Apache Arrow Issue Type: Bug Reporter: Bryan Cutler Assignee: Bryan Cutler JavaTester in Integration tests is commented out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1238) [Java] Add JSON read/write support for decimals for integration tests
Bryan Cutler created ARROW-1238: --- Summary: [Java] Add JSON read/write support for decimals for integration tests Key: ARROW-1238 URL: https://issues.apache.org/jira/browse/ARROW-1238 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1232) [Java] Decouple ArrowStreamReader from specific data input
Bryan Cutler created ARROW-1232: --- Summary: [Java] Decouple ArrowStreamReader from specific data input Key: ARROW-1232 URL: https://issues.apache.org/jira/browse/ARROW-1232 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Currently, the ArrowStreamReader must be constructed with a channel/stream to read data from. It would be better to use an abstraction that would decouple a specific input stream from the incoming messages. This is following the discussion from https://github.com/apache/arrow/pull/839 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1184) [Java] Dictionary.equals is not working correctly
Bryan Cutler created ARROW-1184: --- Summary: [Java] Dictionary.equals is not working correctly Key: ARROW-1184 URL: https://issues.apache.org/jira/browse/ARROW-1184 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Bryan Cutler The {{Dictionary.equals}} method does not return True when the dictionaries are equal. This is because {{equals}} is not implemented for FieldVector and so that comparison defaults to comparing the two objects only and not the vector data. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1181) [Python] Parquet test fail if not enabled
Bryan Cutler created ARROW-1181: --- Summary: [Python] Parquet test fail if not enabled Key: ARROW-1181 URL: https://issues.apache.org/jira/browse/ARROW-1181 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler Assignee: Bryan Cutler Priority: Trivial test test_multiindex_duplicate_values fails if parquet not installed, I believe it just needs the @parquet annotation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1053) [Python] Memory leak with RecordBatchFileReader
Bryan Cutler created ARROW-1053: --- Summary: [Python] Memory leak with RecordBatchFileReader Key: ARROW-1053 URL: https://issues.apache.org/jira/browse/ARROW-1053 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Bryan Cutler While working on SPARK-13534 and running repeated calls to {{toPandas}}, memory usage continues to climb and I isolated to the Python side. The following code reproduces the issue, which looks like a memory leak. Commenting out the block with the {{RecordBatchFileReader}} while leaving the writer, memory usage is stable, so I believe the issue is with the reader. {noformat} import pyarrow as pa import numpy as np import memory_profiler import gc import io def leak(): data = [pa.array(np.concatenate([np.random.randn(10)] * 10))] table = pa.Table.from_arrays(data, ['foo']) while True: print('calling to_pandas') print('memory_usage: {0}'.format(memory_profiler.memory_usage())) df = table.to_pandas() batch = pa.RecordBatch.from_pandas(df) sink = io.BytesIO() writer = pa.RecordBatchFileWriter(sink, batch.schema) writer.write_batch(batch) writer.close() reader = pa.open_file(pa.BufferReader(sink.getvalue())) reader.read_all() gc.collect() leak() {noformat} Some of the output from the code above: {noformat} calling to_pandas memory_usage: [67.0546875] calling to_pandas memory_usage: [143.95703125] calling to_pandas memory_usage: [151.58984375] calling to_pandas memory_usage: [174.453125] calling to_pandas memory_usage: [189.84765625] calling to_pandas memory_usage: [212.7109375] calling to_pandas memory_usage: [228.046875] calling to_pandas memory_usage: [243.109375] calling to_pandas memory_usage: [258.4375] calling to_pandas memory_usage: [273.83203125] calling to_pandas memory_usage: [288.90234375] calling to_pandas memory_usage: [304.23046875] calling to_pandas memory_usage: [319.625] {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-701) [Java] Support additional Date metadata
Bryan Cutler created ARROW-701: -- Summary: [Java] Support additional Date metadata Key: ARROW-701 URL: https://issues.apache.org/jira/browse/ARROW-701 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler The Date type format from ARROW-316 introduced a DateUnit -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-611) [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
[ https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved ARROW-611. Resolution: Not A Problem Fixed in ARROW-673 > [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width > - > > Key: ARROW-611 > URL: https://issues.apache.org/jira/browse/ARROW-611 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Affects Versions: 0.2.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package
[ https://issues.apache.org/jira/browse/ARROW-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905880#comment-15905880 ] Bryan Cutler commented on ARROW-615: PR: https://github.com/apache/arrow/pull/370 > Move ByteArrayReadableSeekableByteChannel to vector.util package > > > Key: ARROW-615 > URL: https://issues.apache.org/jira/browse/ARROW-615 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Minor > > The ByteArrayReadableSeekableByteChannel is useful when reading an > ArrowRecordBatch from a byte array with ArrowReader. Currently it is > vector.file test package, this is proposing to move to > src/main/java/o.a.a.vector.util -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package
[ https://issues.apache.org/jira/browse/ARROW-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned ARROW-615: -- Assignee: Bryan Cutler > Move ByteArrayReadableSeekableByteChannel to vector.util package > > > Key: ARROW-615 > URL: https://issues.apache.org/jira/browse/ARROW-615 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Minor > > The ByteArrayReadableSeekableByteChannel is useful when reading an > ArrowRecordBatch from a byte array with ArrowReader. Currently it is > vector.file test package, this is proposing to move to > src/main/java/o.a.a.vector.util -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-615) Move ByteArrayReadableSeekableByteChannel to vector.util package
Bryan Cutler created ARROW-615: -- Summary: Move ByteArrayReadableSeekableByteChannel to vector.util package Key: ARROW-615 URL: https://issues.apache.org/jira/browse/ARROW-615 Project: Apache Arrow Issue Type: Improvement Reporter: Bryan Cutler Priority: Minor The ByteArrayReadableSeekableByteChannel is useful when reading an ArrowRecordBatch from a byte array with ArrowReader. Currently it is vector.file test package, this is proposing to move to src/main/java/o.a.a.vector.util -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-612) [Java] Field toString should show nullable flag status
[ https://issues.apache.org/jira/browse/ARROW-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905845#comment-15905845 ] Bryan Cutler commented on ARROW-612: PR: https://github.com/apache/arrow/pull/368 > [Java] Field toString should show nullable flag status > -- > > Key: ARROW-612 > URL: https://issues.apache.org/jira/browse/ARROW-612 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Trivial > > Often when comparing Schemas, I'll see an error message like below because of > differing schemas. The only difference is one is nullable and one is not, > but that info is not printed in the {{Field.toString}} method. > {noformat} > - numeric type conversion *** FAILED *** (118 milliseconds) > [info] java.lang.IllegalArgumentException: Different schemas: > [info] Schema > [info] Schema > [info] at > org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43) > {noformat} > I would be nice to match the C++ {{Field.toString}} that prints " not null " > only if the nullable flag is not set. Which would then look like this > {noformat} > [info] Schema > [info] Schema > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-612) [Java] Field toString should show nullable flag status
[ https://issues.apache.org/jira/browse/ARROW-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905812#comment-15905812 ] Bryan Cutler commented on ARROW-612: I'll post a patch > [Java] Field toString should show nullable flag status > -- > > Key: ARROW-612 > URL: https://issues.apache.org/jira/browse/ARROW-612 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Trivial > > Often when comparing Schemas, I'll see an error message like below because of > differing schemas. The only difference is one is nullable and one is not, > but that info is not printed in the {{Field.toString}} method. > {noformat} > - numeric type conversion *** FAILED *** (118 milliseconds) > [info] java.lang.IllegalArgumentException: Different schemas: > [info] Schema > [info] Schema > [info] at > org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43) > {noformat} > I would be nice to match the C++ {{Field.toString}} that prints " not null " > only if the nullable flag is not set. Which would then look like this > {noformat} > [info] Schema > [info] Schema > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-612) [Java] Field toString should show nullable flag status
Bryan Cutler created ARROW-612: -- Summary: [Java] Field toString should show nullable flag status Key: ARROW-612 URL: https://issues.apache.org/jira/browse/ARROW-612 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Bryan Cutler Assignee: Bryan Cutler Priority: Trivial Often when comparing Schemas, I'll see an error message like below because of differing schemas. The only difference is one is nullable and one is not, but that info is not printed in the {{Field.toString}} method. {noformat} - numeric type conversion *** FAILED *** (118 milliseconds) [info] java.lang.IllegalArgumentException: Different schemas: [info] Schema [info] Schema [info] at org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43) {noformat} I would be nice to match the C++ {{Field.toString}} that prints " not null " only if the nullable flag is not set. Which would then look like this {noformat} [info] Schema [info] Schema {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-611) [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
[ https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated ARROW-611: --- Summary: [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width (was: TimeVector TypeLayout is incorrectly specified as 64 bit width) > [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width > - > > Key: ARROW-611 > URL: https://issues.apache.org/jira/browse/ARROW-611 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Affects Versions: 0.2.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-611) TimeVector TypeLayout is incorrectly specified as 64 bit width
[ https://issues.apache.org/jira/browse/ARROW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905771#comment-15905771 ] Bryan Cutler commented on ARROW-611: I can submit a patch for this > TimeVector TypeLayout is incorrectly specified as 64 bit width > -- > > Key: ARROW-611 > URL: https://issues.apache.org/jira/browse/ARROW-611 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Affects Versions: 0.2.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-582) [Java] Add Date/Time Support to JSON File
[ https://issues.apache.org/jira/browse/ARROW-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905765#comment-15905765 ] Bryan Cutler commented on ARROW-582: PR: https://github.com/apache/arrow/pull/366 > [Java] Add Date/Time Support to JSON File > - > > Key: ARROW-582 > URL: https://issues.apache.org/jira/browse/ARROW-582 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Affects Versions: 0.2.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler > > Need to add Date/Time support to JsonFileReader/Writer for the purpose of > integration testing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-413) DATE type is not specified clearly
[ https://issues.apache.org/jira/browse/ARROW-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903535#comment-15903535 ] Bryan Cutler commented on ARROW-413: I started working on ARROW-582, to add Date/Time to JSON files, so I thought I would bump this to see if a conclusion has been reached? > DATE type is not specified clearly > -- > > Key: ARROW-413 > URL: https://issues.apache.org/jira/browse/ARROW-413 > Project: Apache Arrow > Issue Type: Bug > Components: Format >Affects Versions: 0.1.0 >Reporter: Uwe L. Korn > > Currently the DATE type is not specified anywhere and needs to be documented. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-582) [Java] Add Date/Time Support to JSON File
Bryan Cutler created ARROW-582: -- Summary: [Java] Add Date/Time Support to JSON File Key: ARROW-582 URL: https://issues.apache.org/jira/browse/ARROW-582 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Affects Versions: 0.2.0 Reporter: Bryan Cutler Need to add Date/Time support to JsonFileReader/Writer for the purpose of integration testing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-556) [Integration] Can not run Integration tests if different cpp build path
[ https://issues.apache.org/jira/browse/ARROW-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864455#comment-15864455 ] Bryan Cutler commented on ARROW-556: Not a blocker, there are still ways you can run integration testing > [Integration] Can not run Integration tests if different cpp build path > --- > > Key: ARROW-556 > URL: https://issues.apache.org/jira/browse/ARROW-556 > Project: Apache Arrow > Issue Type: Bug >Reporter: Bryan Cutler >Assignee: Wes McKinney >Priority: Minor > > Instructions to run integration tests say to specify the cpp build path and > then export an env var ARROW_CPP_TESTER relative to that build path. The > problem is 2 other vars, STREAM_TO_FILE and FILE_TO_STREAM also rely on the > build path which is made from ARROW_HOME and 'cpp/test-build/debug' and will > fail if that is not the build path used. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-556) [Integration] Can not run Integration tests if different cpp build path
Bryan Cutler created ARROW-556: -- Summary: [Integration] Can not run Integration tests if different cpp build path Key: ARROW-556 URL: https://issues.apache.org/jira/browse/ARROW-556 Project: Apache Arrow Issue Type: Bug Reporter: Bryan Cutler Priority: Minor Instructions to run integration tests say to specify the cpp build path and then export an env var ARROW_CPP_TESTER relative to that build path. The problem is 2 other vars, STREAM_TO_FILE and FILE_TO_STREAM also rely on the build path which is made from ARROW_HOME and 'cpp/test-build/debug' and will fail if that is not the build path used. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-421) [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference to the parent PyBytes to avoid premature garbage collection issues
[ https://issues.apache.org/jira/browse/ARROW-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816632#comment-15816632 ] Bryan Cutler commented on ARROW-421: I thought I'd give this a shot, here is the [PR|https://github.com/apache/arrow/pull/278] > [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a > reference to the parent PyBytes to avoid premature garbage collection issues > > > Key: ARROW-421 > URL: https://issues.apache.org/jira/browse/ARROW-421 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ARROW-396) Python: Add pyarrow.schema.Schema.equals
[ https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned ARROW-396: -- Assignee: Bryan Cutler > Python: Add pyarrow.schema.Schema.equals > > > Key: ARROW-396 > URL: https://issues.apache.org/jira/browse/ARROW-396 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Minor > > Add method in pyarrow to check if 2 schemas are equal. This exists in > Arrow-cpp, just need to call it from the python side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-409) Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead
[ https://issues.apache.org/jira/browse/ARROW-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726954#comment-15726954 ] Bryan Cutler commented on ARROW-409: PR: https://github.com/apache/arrow/pull/229 > Python: Change pyarrow.Table.dataframe_from_batches API to create Table > instead > --- > > Key: ARROW-409 > URL: https://issues.apache.org/jira/browse/ARROW-409 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Minor > > As discussed in PR https://github.com/apache/arrow/pull/216 the pyarrow.Table > API to convert RecordBatches to pandas.DataFrame would be better/more > flexible as follows: > {noformat} > table = pa.Table.from_batches(batches) > df = table.to_pandas() > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-409) Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead
Bryan Cutler created ARROW-409: -- Summary: Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead Key: ARROW-409 URL: https://issues.apache.org/jira/browse/ARROW-409 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Bryan Cutler Priority: Minor As discussed in PR https://github.com/apache/arrow/pull/216 the pyarrow.Table API to convert RecordBatches to pandas.DataFrame would be better/more flexible as follows: {noformat} table = pa.Table.from_batches(batches) df = table.to_pandas() {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-396) Python: Add pyarrow.schema.Schema.equals
[ https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710161#comment-15710161 ] Bryan Cutler commented on ARROW-396: PR: https://github.com/apache/arrow/pull/221 > Python: Add pyarrow.schema.Schema.equals > > > Key: ARROW-396 > URL: https://issues.apache.org/jira/browse/ARROW-396 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Priority: Minor > > Add method in pyarrow to check if 2 schemas are equal. This exists in > Arrow-cpp, just need to call it from the python side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-396) Python: Add pyarrow.schema.Schema.equals
[ https://issues.apache.org/jira/browse/ARROW-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709672#comment-15709672 ] Bryan Cutler commented on ARROW-396: I can add this > Python: Add pyarrow.schema.Schema.equals > > > Key: ARROW-396 > URL: https://issues.apache.org/jira/browse/ARROW-396 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Bryan Cutler >Priority: Minor > > Add method in pyarrow to check if 2 schemas are equal. This exists in > Arrow-cpp, just need to call it from the python side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-396) Python: Add pyarrow.schema.Schema.equals
Bryan Cutler created ARROW-396: -- Summary: Python: Add pyarrow.schema.Schema.equals Key: ARROW-396 URL: https://issues.apache.org/jira/browse/ARROW-396 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Bryan Cutler Priority: Minor Add method in pyarrow to check if 2 schemas are equal. This exists in Arrow-cpp, just need to call it from the python side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-369) [Python] Add ability to convert multiple record batches at once to pandas
[ https://issues.apache.org/jira/browse/ARROW-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701155#comment-15701155 ] Bryan Cutler commented on ARROW-369: PR: https://github.com/apache/arrow/pull/216 > [Python] Add ability to convert multiple record batches at once to pandas > - > > Key: ARROW-369 > URL: https://issues.apache.org/jira/browse/ARROW-369 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney > Labels: newbie > > Instead of only being able to only convert single single record batches and > tables that consist only of single ColumnChunks, we should also support the > construction of Pandas DataFrames from multiple RecordBatches. In the most > simple way, we would convert each batch to a Pandas DataFrame and then concat > them all together. A second (and preferred) implementation would extend the > C++ function {{ConvertColumnToPandas}} in > {{python/src/pyarrow/adapters/pandas.*}} to work on chunked columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-369) [Python] Add ability to convert multiple record batches at once to pandas
[ https://issues.apache.org/jira/browse/ARROW-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674374#comment-15674374 ] Bryan Cutler commented on ARROW-369: I could work on this if you don't mind. I was already doing this using concat in some of my local testing, so I'll take a crack at the chunked columns implementation. > [Python] Add ability to convert multiple record batches at once to pandas > - > > Key: ARROW-369 > URL: https://issues.apache.org/jira/browse/ARROW-369 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney > Labels: newbie > > Instead of only being able to only convert single single record batches and > tables that consist only of single ColumnChunks, we should also support the > construction of Pandas DataFrames from multiple RecordBatches. In the most > simple way, we would convert each batch to a Pandas DataFrame and then concat > them all together. A second (and preferred) implementation would extend the > C++ function {{ConvertColumnToPandas}} in > {{python/src/pyarrow/adapters/pandas.*}} to work on chunked columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python
[ https://issues.apache.org/jira/browse/ARROW-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637123#comment-15637123 ] Bryan Cutler commented on ARROW-368: PR: https://github.com/apache/arrow/pull/199 > Document use of LD_LIBRARY_PATH when using Python > - > > Key: ARROW-368 > URL: https://issues.apache.org/jira/browse/ARROW-368 > Project: Apache Arrow > Issue Type: Task > Components: Python >Reporter: Bryan Cutler >Priority: Trivial > Labels: documentation > > Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow > will need this location or will get an import error. A note should be added > to Python README put this path in the LD_LIBRARY_PATH env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python
[ https://issues.apache.org/jira/browse/ARROW-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637122#comment-15637122 ] Bryan Cutler commented on ARROW-368: [~xhochy] Great! I was about to add it there also, but please do in your PR since your working on that now > Document use of LD_LIBRARY_PATH when using Python > - > > Key: ARROW-368 > URL: https://issues.apache.org/jira/browse/ARROW-368 > Project: Apache Arrow > Issue Type: Task > Components: Python >Reporter: Bryan Cutler >Priority: Trivial > Labels: documentation > > Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow > will need this location or will get an import error. A note should be added > to Python README put this path in the LD_LIBRARY_PATH env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ARROW-368) Document use of LD_LIBRARY_PATH when using Python
Bryan Cutler created ARROW-368: -- Summary: Document use of LD_LIBRARY_PATH when using Python Key: ARROW-368 URL: https://issues.apache.org/jira/browse/ARROW-368 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Bryan Cutler Priority: Trivial Currently, docs instruct libarrow.so to be under $ARROW_HOME/lib but pyarrow will need this location or will get an import error. A note should be added to Python README put this path in the LD_LIBRARY_PATH env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)