[jira] [Created] (ARROW-2550) [C++] Add missing status codes into arrow::StatusCode::CodeAsString()
Kouhei Sutou created ARROW-2550: --- Summary: [C++] Add missing status codes into arrow::StatusCode::CodeAsString() Key: ARROW-2550 URL: https://issues.apache.org/jira/browse/ARROW-2550 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.9.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2549) [GLib] Apply arrow::StatusCodes changes to GArrowError
Kouhei Sutou created ARROW-2549: --- Summary: [GLib] Apply arrow::StatusCodes changes to GArrowError Key: ARROW-2549 URL: https://issues.apache.org/jira/browse/ARROW-2549 Project: Apache Arrow Issue Type: Improvement Components: GLib Affects Versions: 0.9.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2548) [Format] Clarify `List` Array example
Uwe L. Korn created ARROW-2548: -- Summary: [Format] Clarify `List` Array example Key: ARROW-2548 URL: https://issues.apache.org/jira/browse/ARROW-2548 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Uwe L. Korn Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2547) [Format] Fix off-by-one in List> example
Uwe L. Korn created ARROW-2547: -- Summary: [Format] Fix off-by-one in List> example Key: ARROW-2547 URL: https://issues.apache.org/jira/browse/ARROW-2547 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Uwe L. Korn Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2546) [CI] Intermittent npm failures
Antoine Pitrou created ARROW-2546: - Summary: [CI] Intermittent npm failures Key: ARROW-2546 URL: https://issues.apache.org/jira/browse/ARROW-2546 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, JavaScript Reporter: Antoine Pitrou See for example https://travis-ci.org/apache/arrow/jobs/375891278 . {code} npm WARN deprecated gulp-util@3.0.8: gulp-util is deprecated - replace it, following the guidelines at https://medium.com/gulpjs/gulp-util-ca3b1f9f9ac5 npm WARN deprecated standard-format@1.6.10: standard-format is deprecated in favor of a built-in autofixer in 'standard'. Usage: standard --fix npm WARN deprecated minimatch@2.0.10: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue npm WARN tar ENOENT: no such file or directory, open '/home/travis/build/apache/arrow/js/node_modules/.staging/google-closure-compiler-2d7bab98/contrib/externs/maps/google_maps_api_v3_23.js' npm WARN ajv-keywords@3.2.0 requires a peer of ajv@^6.0.0 but none is installed. You must install peer dependencies yourself. npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.3 (node_modules/fsevents): npm WARN enoent SKIPPING OPTIONAL DEPENDENCY: ENOENT: no such file or directory, rename '/home/travis/build/apache/arrow/js/node_modules/.staging/fsevents-5f35bbaf/node_modules/abbrev' -> '/home/travis/build/apache/arrow/js/node_modules/.staging/abbrev-e214f964' npm ERR! code EINTEGRITY npm ERR! sha512-bqB1yS6o9TNA9ZC/MJxM0FZzPnZdtHj0xWK/IZ5khzVqdpGul/R/EIiHRgFXlwTD7PSIaYVnGKq1QgMCu2mnqw== integrity checksum failed when using sha512: wanted sha512-bqB1yS6o9TNA9ZC/MJxM0FZzPnZdtHj0xWK/IZ5khzVqdpGul/R/EIiHRgFXlwTD7PSIaYVnGKq1QgMCu2mnqw== but got sha512-kgTmj+eAwkxGNzcVy5l66pJ3Exmxgj4IdQQ5fK53JTbfThLZFQybsk64V8pq2MMKXcqkkU6/0gGHXKbURv065w==. (4688848 bytes) npm ERR! A complete log of this run can be found in: npm ERR! /home/travis/.npm/_logs/2018-05-07T13_34_45_558Z-debug.log {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2545) [Python] Arrow fails linking against statically-compiled Python
Antoine Pitrou created ARROW-2545: - Summary: [Python] Arrow fails linking against statically-compiled Python Key: ARROW-2545 URL: https://issues.apache.org/jira/browse/ARROW-2545 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Antoine Pitrou See https://issues.apache.org/jira/browse/ARROW-1661?focusedCommentId=16462745&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16462745 : to link statically against {{libpythonXX.a}}, you need to add in some system libraries such as {{libutil}}. Otherwise some symbols end up unresolved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Writing empty strings to parquet files
Hi Wes: Thanks for your message. I would say that both test_pandas_parquet_1_0_rountrip and test_pandas_parquet_2_0_rountrip (in arrow/python/pyarrow/tests/test_parquet.py) already test this. Sorry I didn’t realize this sooner. All the best, Sergio Carrascoso > On 5 May 2018, at 01:31, Wes McKinney wrote: > > Thanks Sergio. If we don't have any unit tests explicitly testing > this, it would be a good idea to add some anyway. > > - Wes > > On Fri, May 4, 2018 at 12:26 PM, wrote: >> Hi Uwe: >> >> Thanks a lot for your feedback. >> >> While preparing a simple example to reproduce this issue, I have been able >> to get the expected behavior (empty strings properly written as ‘’ in the >> parquet file). >> So actually there’s no problem with the Parquet.write_table >> >> The problem was rather in a bug whereas two steps in my process were in the >> wrong order, so None values were being applied unicode formatting earlier >> than expected, thus becoming ‘None’. >> >> Again, thank you very much and apologies for the noise. >> >> Best, >> >> Sergio Carrascoso >> >>> On 4 May 2018, at 10:54, Uwe L. Korn wrote: >>> >>> Hello Sergio, >>> >>> this is definitely unwanted behaviour. Can you open an issue on >>> https://issues.apache.org/jira/projects/PARQUET and provide a minimal >>> reproducing example. There is definitely a difference between empty strings >>> and null strings. Parquet also supports the differentiation thus we should >>> support roundtripping them. >>> >>> Uwe >>> >>> On Thu, May 3, 2018, at 8:47 AM, scarrasc...@ravenpack.com wrote: Hi: I would like to know if there is any way in PyArrow to write empty string values to a parquet file. When I use Parquet.write_table, if any column contains empty string values, they end up as None in the parquet file. My process depends on these values to be properly written as empty strings in the parquet files. To provide some context, my current worflow is the following: - Read content from json files (using Pandas.read_json) - Convert the corresponding dataframe to a PyArrow table (using PyArrow.Table.from_pandas) - Finally, write the table to a parquet file (using Parquet.write_table) I have done some checks during the process, and the empty string values are being honored until the writing step to a parquet file. The options for the write_table method don't provide any specific for this, is this behavior (write '' as None) an unavoidable default? Is there any other way to write the parquet files where I have more options to deal with this? Any hint or feedback will be greatly appreciated. Thanks a lot in advance, all the best. Sergio Carrascoso >>
[jira] [Created] (ARROW-2544) [CI] Run C++ tests with two jobs on Travis-CI
Antoine Pitrou created ARROW-2544: - Summary: [CI] Run C++ tests with two jobs on Travis-CI Key: ARROW-2544 URL: https://issues.apache.org/jira/browse/ARROW-2544 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Antoine Pitrou Assignee: Omer Katz See https://github.com/apache/arrow/pull/1899 -- This message was sent by Atlassian JIRA (v7.6.3#76005)