[jira] [Commented] (ARROW-8175) [Python] Setup type checking with mypy
[ https://issues.apache.org/jira/browse/ARROW-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644561#comment-17644561 ] Kyle Barron commented on ARROW-8175: Working with pyarrow in e.g. vscode is really painful at the moment because of the lack of user-facing types. vscode can access the top-level functions, but then as soon as you create a class, it becomes `Any`. {code:python} import pyarrow.parquet as pq table = pq.read_table() table # <-- Any table. # <-- no auto completions {code} This may be a slightly different ask than the title of this ticket, as I'm referring to the developer experience while writing the code, not _checking_ code. I can create a separate ticket if desired. For now, I may create my own third-party type hints for this using mypy's stubgen > [Python] Setup type checking with mypy > -- > > Key: ARROW-8175 > URL: https://issues.apache.org/jira/browse/ARROW-8175 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Uwe Korn >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Get mypy checks running, activate things like {{check_untyped_defs}} later. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-17404) [Java] Consolidate JNI compilation #2
[ https://issues.apache.org/jira/browse/ARROW-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Dali Susanibar Arce resolved ARROW-17404. --- Resolution: Fixed > [Java] Consolidate JNI compilation #2 > - > > Key: ARROW-17404 > URL: https://issues.apache.org/jira/browse/ARROW-17404 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Java >Reporter: David Dali Susanibar Arce >Priority: Major > > *Umbrella ticket for consolidating Java JNI compilation initiative #2* > Initial part of consolidate JNI Java initiative was: Consolidate ORC/Dataset > code and Separate JNI CMakeLists.txt compilation. > This 2nd part consist on: > 1.- Make the Java library able to compile with a single mvn command > 2.- Make Java library able to compile from an installed libarrow > 3.- Migrate remaining C++ CMakeLists.txt specific to Java into the Java > project: ORC / Dataset / Gandiva > 4.- Add windows build script that produces DLLs > 5.- Incorporate Windows DLLs into the maven packages > 6.- Migrate ORC JNI to use C-Data-Interface -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17464) [C++] Support float16 in writing/reading parquet
[ https://issues.apache.org/jira/browse/ARROW-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644536#comment-17644536 ] Anja Boskovic commented on ARROW-17464: --- An update! Parquet-1222 (https://issues.apache.org/jira/browse/PARQUET-1222) which was a blocker for adding float16 support to parquet, has been merged. > [C++] Support float16 in writing/reading parquet > > > Key: ARROW-17464 > URL: https://issues.apache.org/jira/browse/ARROW-17464 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet, Python >Reporter: Anja Boskovic >Priority: Major > Labels: parquet, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Half-float values are not supported in Parquet. Here is a previous issue that > talks about that: https://issues.apache.org/jira/browse/PARQUET-1647 > So, this will not work: > {code:java} > import pyarrow as pa > import pyarrow.parquet as pq > import numpy as np > arr = pa.array(np.float16([0.1, 2.2, 3])) > table = pa.table({'a': arr}) > pq.write_table(table, "test_halffloat.parquet") {code} > {{This is a proposal to store float16 values in Parquet as FixedSizeBinary, > and then restore them to float16 when reading them back in.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17404) [Java] Consolidate JNI compilation #2
[ https://issues.apache.org/jira/browse/ARROW-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644519#comment-17644519 ] Kouhei Sutou commented on ARROW-17404: -- [~dsusanibara] Can we close this? Is there any more task to be resolved? > [Java] Consolidate JNI compilation #2 > - > > Key: ARROW-17404 > URL: https://issues.apache.org/jira/browse/ARROW-17404 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Java >Reporter: David Dali Susanibar Arce >Priority: Major > > *Umbrella ticket for consolidating Java JNI compilation initiative #2* > Initial part of consolidate JNI Java initiative was: Consolidate ORC/Dataset > code and Separate JNI CMakeLists.txt compilation. > This 2nd part consist on: > 1.- Make the Java library able to compile with a single mvn command > 2.- Make Java library able to compile from an installed libarrow > 3.- Migrate remaining C++ CMakeLists.txt specific to Java into the Java > project: ORC / Dataset / Gandiva > 4.- Add windows build script that produces DLLs > 5.- Incorporate Windows DLLs into the maven packages > 6.- Migrate ORC JNI to use C-Data-Interface -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-12724) [C++][Docs] Add documentation for authoring compute kernels
[ https://issues.apache.org/jira/browse/ARROW-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644485#comment-17644485 ] Aldrin Montana commented on ARROW-12724: back from vacation and other tasks, will try to cut a reviewable draft this week and maybe break the doc up into incremental pieces to make it easier to release > [C++][Docs] Add documentation for authoring compute kernels > --- > > Key: ARROW-12724 > URL: https://issues.apache.org/jira/browse/ARROW-12724 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Eduardo Ponce >Assignee: Aldrin Montana >Priority: Critical > Labels: pull-request-available > Time Spent: 13h 10m > Remaining Estimate: 0h > > To help incoming developer's to work in the compute layer, it would be good > to have documentation on the process to follow for authoring a new compute > kernel. This document can help demystify the inner workings of the functions > and data structures in the compute layer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17464) [C++] Support float16 in writing/reading parquet
[ https://issues.apache.org/jira/browse/ARROW-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644454#comment-17644454 ] Apache Arrow JIRA Bot commented on ARROW-17464: --- This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per [project policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment]. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon. > [C++] Support float16 in writing/reading parquet > > > Key: ARROW-17464 > URL: https://issues.apache.org/jira/browse/ARROW-17464 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet, Python >Reporter: Anja Boskovic >Assignee: Anja Boskovic >Priority: Major > Labels: parquet, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Half-float values are not supported in Parquet. Here is a previous issue that > talks about that: https://issues.apache.org/jira/browse/PARQUET-1647 > So, this will not work: > {code:java} > import pyarrow as pa > import pyarrow.parquet as pq > import numpy as np > arr = pa.array(np.float16([0.1, 2.2, 3])) > table = pa.table({'a': arr}) > pq.write_table(table, "test_halffloat.parquet") {code} > {{This is a proposal to store float16 values in Parquet as FixedSizeBinary, > and then restore them to float16 when reading them back in.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17404) [Java] Consolidate JNI compilation #2
[ https://issues.apache.org/jira/browse/ARROW-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644455#comment-17644455 ] Apache Arrow JIRA Bot commented on ARROW-17404: --- This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per [project policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment]. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon. > [Java] Consolidate JNI compilation #2 > - > > Key: ARROW-17404 > URL: https://issues.apache.org/jira/browse/ARROW-17404 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Java >Reporter: David Dali Susanibar Arce >Assignee: David Dali Susanibar Arce >Priority: Major > > *Umbrella ticket for consolidating Java JNI compilation initiative #2* > Initial part of consolidate JNI Java initiative was: Consolidate ORC/Dataset > code and Separate JNI CMakeLists.txt compilation. > This 2nd part consist on: > 1.- Make the Java library able to compile with a single mvn command > 2.- Make Java library able to compile from an installed libarrow > 3.- Migrate remaining C++ CMakeLists.txt specific to Java into the Java > project: ORC / Dataset / Gandiva > 4.- Add windows build script that produces DLLs > 5.- Incorporate Windows DLLs into the maven packages > 6.- Migrate ORC JNI to use C-Data-Interface -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17464) [C++] Support float16 in writing/reading parquet
[ https://issues.apache.org/jira/browse/ARROW-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-17464: - Assignee: (was: Anja Boskovic) > [C++] Support float16 in writing/reading parquet > > > Key: ARROW-17464 > URL: https://issues.apache.org/jira/browse/ARROW-17464 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet, Python >Reporter: Anja Boskovic >Priority: Major > Labels: parquet, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Half-float values are not supported in Parquet. Here is a previous issue that > talks about that: https://issues.apache.org/jira/browse/PARQUET-1647 > So, this will not work: > {code:java} > import pyarrow as pa > import pyarrow.parquet as pq > import numpy as np > arr = pa.array(np.float16([0.1, 2.2, 3])) > table = pa.table({'a': arr}) > pq.write_table(table, "test_halffloat.parquet") {code} > {{This is a proposal to store float16 values in Parquet as FixedSizeBinary, > and then restore them to float16 when reading them back in.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (ARROW-17404) [Java] Consolidate JNI compilation #2
[ https://issues.apache.org/jira/browse/ARROW-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Arrow JIRA Bot reassigned ARROW-17404: - Assignee: (was: David Dali Susanibar Arce) > [Java] Consolidate JNI compilation #2 > - > > Key: ARROW-17404 > URL: https://issues.apache.org/jira/browse/ARROW-17404 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Java >Reporter: David Dali Susanibar Arce >Priority: Major > > *Umbrella ticket for consolidating Java JNI compilation initiative #2* > Initial part of consolidate JNI Java initiative was: Consolidate ORC/Dataset > code and Separate JNI CMakeLists.txt compilation. > This 2nd part consist on: > 1.- Make the Java library able to compile with a single mvn command > 2.- Make Java library able to compile from an installed libarrow > 3.- Migrate remaining C++ CMakeLists.txt specific to Java into the Java > project: ORC / Dataset / Gandiva > 4.- Add windows build script that produces DLLs > 5.- Incorporate Windows DLLs into the maven packages > 6.- Migrate ORC JNI to use C-Data-Interface -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-12264) [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down
[ https://issues.apache.org/jira/browse/ARROW-12264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-12264: --- Issue Type: Bug (was: Task) > [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down > --- > > Key: ARROW-12264 > URL: https://issues.apache.org/jira/browse/ARROW-12264 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet >Reporter: Antoine Pitrou >Priority: Major > > The Parquet spec (in parquet.thrift) says the following about handling of > floating-point statistics: > {code} >* (*) Because the sorting order is not specified properly for floating >* point values (relations vs. total ordering) the following >* compatibility rules should be applied when reading statistics: >* - If the min is a NaN, it should be ignored. >* - If the max is a NaN, it should be ignored. >* - If the min is +0, the row group may contain -0 values as well. >* - If the max is -0, the row group may contain +0 values as well. >* - When looking for NaN values, min and max should be ignored. > {code} > It appears that the dataset code uses the following filter expression when > doing Parquet predicate push-down (in {{file_parquet.cc}}): > {code:c++} > return and_(greater_equal(field_expr, literal(min)), > less_equal(field_expr, literal(max))); > {code} > A NaN value will fail that filter and yet may be found in the given Parquet > column chunk. > We may instead need a "greater_equal_or_nan" comparison that returns true if > either value is NaN. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (ARROW-12264) [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down
[ https://issues.apache.org/jira/browse/ARROW-12264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644354#comment-17644354 ] Antoine Pitrou edited comment on ARROW-12264 at 12/7/22 2:08 PM: - cc [~westonpace] was (Author: pitrou): cc @westonpace > [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down > --- > > Key: ARROW-12264 > URL: https://issues.apache.org/jira/browse/ARROW-12264 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet >Reporter: Antoine Pitrou >Priority: Major > > The Parquet spec (in parquet.thrift) says the following about handling of > floating-point statistics: > {code} >* (*) Because the sorting order is not specified properly for floating >* point values (relations vs. total ordering) the following >* compatibility rules should be applied when reading statistics: >* - If the min is a NaN, it should be ignored. >* - If the max is a NaN, it should be ignored. >* - If the min is +0, the row group may contain -0 values as well. >* - If the max is -0, the row group may contain +0 values as well. >* - When looking for NaN values, min and max should be ignored. > {code} > It appears that the dataset code uses the following filter expression when > doing Parquet predicate push-down (in {{file_parquet.cc}}): > {code:c++} > return and_(greater_equal(field_expr, literal(min)), > less_equal(field_expr, literal(max))); > {code} > A NaN value will fail that filter and yet may be found in the given Parquet > column chunk. > We may instead need a "greater_equal_or_nan" comparison that returns true if > either value is NaN. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-12264) [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down
[ https://issues.apache.org/jira/browse/ARROW-12264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644354#comment-17644354 ] Antoine Pitrou commented on ARROW-12264: cc @westonpace > [C++][Dataset] Handle NaNs correctly in Parquet predicate push-down > --- > > Key: ARROW-12264 > URL: https://issues.apache.org/jira/browse/ARROW-12264 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet >Reporter: Antoine Pitrou >Priority: Major > > The Parquet spec (in parquet.thrift) says the following about handling of > floating-point statistics: > {code} >* (*) Because the sorting order is not specified properly for floating >* point values (relations vs. total ordering) the following >* compatibility rules should be applied when reading statistics: >* - If the min is a NaN, it should be ignored. >* - If the max is a NaN, it should be ignored. >* - If the min is +0, the row group may contain -0 values as well. >* - If the max is -0, the row group may contain +0 values as well. >* - When looking for NaN values, min and max should be ignored. > {code} > It appears that the dataset code uses the following filter expression when > doing Parquet predicate push-down (in {{file_parquet.cc}}): > {code:c++} > return and_(greater_equal(field_expr, literal(min)), > less_equal(field_expr, literal(max))); > {code} > A NaN value will fail that filter and yet may be found in the given Parquet > column chunk. > We may instead need a "greater_equal_or_nan" comparison that returns true if > either value is NaN. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-14799) [C++] Adding tabular pretty printing of Table / RecordBatch
[ https://issues.apache.org/jira/browse/ARROW-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644352#comment-17644352 ] Joris Van den Bossche commented on ARROW-14799: --- If we tackle this in C++, it might be worth checking out duckdb's implementation. If we decide to tackle this in the bindings, for Python it might be worth checking out ibis' implementation (using rich, they recently revamped there table representation, including support for nested columns). > [C++] Adding tabular pretty printing of Table / RecordBatch > --- > > Key: ARROW-14799 > URL: https://issues.apache.org/jira/browse/ARROW-14799 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Joris Van den Bossche >Priority: Major > > It would be nice to show a "preview" (eg xx number of first and last rows) of > a Table or RecordBatch in a traditional tabular form (like pandas DataFrames, > or R data.frame / tibbles have, or in a format that resembles markdown > tables). > This could also be added in the bindings, but we could also do it on the C++ > level to benefit multiple bindings at once. > Based on a quick search, there is https://github.com/p-ranav/tabulate which > could be vendored (it has a single-include version). > I suppose that nested data types could represent a challenge on how to > include those in a tabular format, though. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-13240) [C++][Parquet] Page statistics not written in v2?
[ https://issues.apache.org/jira/browse/ARROW-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644350#comment-17644350 ] Antoine Pitrou commented on ARROW-13240: [~jorgecarleitao] Could you try to check if that still happens with the latest PyArrow? > [C++][Parquet] Page statistics not written in v2? > - > > Key: ARROW-13240 > URL: https://issues.apache.org/jira/browse/ARROW-13240 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Jorge Leitão >Priority: Major > > While working in integration tests of parquet2 against pyarrow, I noticed > that page statistics are only written by pyarrow when using version 1. > I do not have an easy way to reproduce this within pyarrow as I am not sure > how to access individual pages from a column chunk, but it is something that > I observe when trying to integrate. > The row group stats are still written, this only affects page statistics. > pyarrow call: > ``` > pa.parquet.write_table( > t, > path, > version="2.0", > data_page_version="2.0", > write_statistics=True, > ) > ``` > changing version to "1.0" does not impact this behavior, suggesting that the > specific option causing this behavior is the data_page_version="2.0". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-13240) [C++][Parquet] Page statistics not written in v2?
[ https://issues.apache.org/jira/browse/ARROW-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644349#comment-17644349 ] Antoine Pitrou commented on ARROW-13240: [~emkornfield] When would that have happened? > [C++][Parquet] Page statistics not written in v2? > - > > Key: ARROW-13240 > URL: https://issues.apache.org/jira/browse/ARROW-13240 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Jorge Leitão >Priority: Major > > While working in integration tests of parquet2 against pyarrow, I noticed > that page statistics are only written by pyarrow when using version 1. > I do not have an easy way to reproduce this within pyarrow as I am not sure > how to access individual pages from a column chunk, but it is something that > I observe when trying to integrate. > The row group stats are still written, this only affects page statistics. > pyarrow call: > ``` > pa.parquet.write_table( > t, > path, > version="2.0", > data_page_version="2.0", > write_statistics=True, > ) > ``` > changing version to "1.0" does not impact this behavior, suggesting that the > specific option causing this behavior is the data_page_version="2.0". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-13240) [C++][Parquet] Page statistics not written in v2?
[ https://issues.apache.org/jira/browse/ARROW-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-13240: --- Priority: Major (was: Minor) > [C++][Parquet] Page statistics not written in v2? > - > > Key: ARROW-13240 > URL: https://issues.apache.org/jira/browse/ARROW-13240 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Jorge Leitão >Priority: Major > > While working in integration tests of parquet2 against pyarrow, I noticed > that page statistics are only written by pyarrow when using version 1. > I do not have an easy way to reproduce this within pyarrow as I am not sure > how to access individual pages from a column chunk, but it is something that > I observe when trying to integrate. > The row group stats are still written, this only affects page statistics. > pyarrow call: > ``` > pa.parquet.write_table( > t, > path, > version="2.0", > data_page_version="2.0", > write_statistics=True, > ) > ``` > changing version to "1.0" does not impact this behavior, suggesting that the > specific option causing this behavior is the data_page_version="2.0". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-18123) [Python] Cannot use multi-byte characters in file names in write_table
[ https://issues.apache.org/jira/browse/ARROW-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-18123. --- Resolution: Fixed Issue resolved by pull request 14764 https://github.com/apache/arrow/pull/14764 > [Python] Cannot use multi-byte characters in file names in write_table > -- > > Key: ARROW-18123 > URL: https://issues.apache.org/jira/browse/ARROW-18123 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 9.0.0 >Reporter: SHIMA Tatsuya >Assignee: Miles Granger >Priority: Critical > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Error when specifying a file path containing multi-byte characters in > {{pyarrow.parquet.write_table}}. > For example, use {{例.parquet}} as the file path. > {code:python} > Python 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import pandas as pd > >>> import numpy as np > >>> import pyarrow as pa > >>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], > ...'two': ['foo', 'bar', 'baz'], > ...'three': [True, False, True]}, > ...index=list('abc')) > >>> table = pa.Table.from_pandas(df) > >>> import pyarrow.parquet as pq > >>> pq.write_table(table, '例.parquet') > Traceback (most recent call last): > File "", line 1, in > File > "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", > line 2920, in write_table > with ParquetWriter( > File > "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", > line 911, in __init__ > filesystem, path = _resolve_filesystem_and_path( > File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line > 184, in _resolve_filesystem_and_path > filesystem, path = FileSystem.from_uri(path) > File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ARROW-18320) [C++] Flight client may crash due to improper Result/Status conversion
[ https://issues.apache.org/jira/browse/ARROW-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li resolved ARROW-18320. -- Fix Version/s: 11.0.0 Resolution: Fixed Resolved by [https://github.com/apache/arrow/pull/14859] > [C++] Flight client may crash due to improper Result/Status conversion > -- > > Key: ARROW-18320 > URL: https://issues.apache.org/jira/browse/ARROW-18320 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC >Affects Versions: 6.0.0 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 11.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Reported on user@ > https://lists.apache.org/thread/84z329t1djhnbr5bq936v4hr8cyngj2l > {noformat} > I have an issue on my project, we have a query execution engine that > returns result data as a flight stream and c++ client that receives the > stream. In case a query has no results but the result schema implies > dictionary encoded fields in results we have client app crushed. > The cause is in cpp/src/arrow/flight/client.cc:461: > ::arrow::Result> ReadNextMessage() override { > if (stream_finished_) { > return nullptr; > } > internal::FlightData* data; > { > auto guard = read_mutex_ ? std::unique_lock(*read_mutex_) > : std::unique_lock(); > peekable_reader_->Next(); > } > if (!data) { > stream_finished_ = true; > return stream_->Finish(Status::OK()); // Here the issue > } > // Validate IPC message > auto result = data->OpenMessage(); > if (!result.ok()) { > return stream_->Finish(std::move(result).status()); > } > *app_metadata_ = std::move(data->app_metadata); > return result; > } > The method returns Result object while stream_Finish(..) returns a Status. > So there is an implicit conversion from Status to Result that causes > Result(Status) constructor to be called, but the constructor expects only > error statuses which in turn causes the app to be failed: > /// Constructs a Result object with the given non-OK Status object. All > /// calls to ValueOrDie() on this object will abort. The given `status` must > /// not be an OK status, otherwise this constructor will abort. > /// > /// This constructor is not declared explicit so that a function with a > return > /// type of `Result` can return a Status object, and the status will be > /// implicitly converted to the appropriate return type as a matter of > /// convenience. > /// > /// \param status The non-OK Status object to initialize to. > Result(const Status& status) noexcept // NOLINT(runtime/explicit) > : status_(status) { > if (ARROW_PREDICT_FALSE(status.ok())) { > internal::DieWithMessage(std::string("Constructed with a non-error status: ") > + > status.ToString()); > } > } > Is there a way to workaround or fix it? We use Arrow 6.0.0, but it seems > that the issue exists in all future versions. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18427) [C++] Support negative tolerance in `AsofJoinNode`
[ https://issues.apache.org/jira/browse/ARROW-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaron Gvili updated ARROW-18427: Description: Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing past-joining, i.e., joining right-table rows with a timestamp at or before that of the left-table row. This issue will add support for a negative tolerance, which would allow future-joining too. (was: Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing past-joining, i.e., joining right-table rows with a timestamp at or before that of the left-table row. This issue will add support for a positive tolerance, which would allow future-joining too.) > [C++] Support negative tolerance in `AsofJoinNode` > -- > > Key: ARROW-18427 > URL: https://issues.apache.org/jira/browse/ARROW-18427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > > Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing > past-joining, i.e., joining right-table rows with a timestamp at or before > that of the left-table row. This issue will add support for a negative > tolerance, which would allow future-joining too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18427) [C++] Support negative tolerance in `AsofJoinNode`
[ https://issues.apache.org/jira/browse/ARROW-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaron Gvili updated ARROW-18427: Summary: [C++] Support negative tolerance in `AsofJoinNode` (was: [C++] Support negative toletance in `AsofJoinNode`) > [C++] Support negative tolerance in `AsofJoinNode` > -- > > Key: ARROW-18427 > URL: https://issues.apache.org/jira/browse/ARROW-18427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > > Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing > past-joining, i.e., joining right-table rows with a timestamp at or before > that of the left-table row. This issue will add support for a positive > tolerance, which would allow future-joining too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18427) [C++] Support negative toletance in `AsofJoinNode`
[ https://issues.apache.org/jira/browse/ARROW-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaron Gvili updated ARROW-18427: Summary: [C++] Support negative toletance in `AsofJoinNode` (was: [C++] Suppose negative toletance in `AsofJoinNode`) > [C++] Support negative toletance in `AsofJoinNode` > -- > > Key: ARROW-18427 > URL: https://issues.apache.org/jira/browse/ARROW-18427 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > > Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing > past-joining, i.e., joining right-table rows with a timestamp at or before > that of the left-table row. This issue will add support for a positive > tolerance, which would allow future-joining too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18427) [C++] Suppose negative toletance in `AsofJoinNode`
Yaron Gvili created ARROW-18427: --- Summary: [C++] Suppose negative toletance in `AsofJoinNode` Key: ARROW-18427 URL: https://issues.apache.org/jira/browse/ARROW-18427 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Yaron Gvili Assignee: Yaron Gvili Currently, `AsofJoinNode` supports a tolerance that is non-negative, allowing past-joining, i.e., joining right-table rows with a timestamp at or before that of the left-table row. This issue will add support for a positive tolerance, which would allow future-joining too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18003) [Python] Add sort_by to RecordBatch
[ https://issues.apache.org/jira/browse/ARROW-18003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-18003: -- Labels: good-first-issue (was: ) > [Python] Add sort_by to RecordBatch > --- > > Key: ARROW-18003 > URL: https://issues.apache.org/jira/browse/ARROW-18003 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alessandro Molina >Priority: Major > Labels: good-first-issue > Fix For: 11.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ARROW-18003) [Python] Add sort_by to RecordBatch
[ https://issues.apache.org/jira/browse/ARROW-18003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-18003: -- Summary: [Python] Add sort_by to RecordBatch (was: [Python] Add sort_by to Table and RecordBatch) > [Python] Add sort_by to RecordBatch > --- > > Key: ARROW-18003 > URL: https://issues.apache.org/jira/browse/ARROW-18003 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alessandro Molina >Priority: Major > Fix For: 11.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (ARROW-18402) [C++] Expose `DeclarationInfo`
[ https://issues.apache.org/jira/browse/ARROW-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-18402. --- Resolution: Fixed Fixed by PR https://github.com/apache/arrow/pull/14765 > [C++] Expose `DeclarationInfo` > -- > > Key: ARROW-18402 > URL: https://issues.apache.org/jira/browse/ARROW-18402 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yaron Gvili >Assignee: Yaron Gvili >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > `DeclarationInfo` is just a pair of `Declaration` and `Schema`, which are > public APIs, and so can be made public API itself. This can be part of or a > follow-up on [https://github.com/apache/arrow/pull/14485], and will allow > implementing extension providers, whose API depends on `DeclarationInfo`, > outside of the Arrow repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)