[jira] [Updated] (ARROW-11354) [Rust] Speed-up casts of dates and times
[ https://issues.apache.org/jira/browse/ARROW-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11354: --- Labels: pull-request-available (was: ) > [Rust] Speed-up casts of dates and times > > > Key: ARROW-11354 > URL: https://issues.apache.org/jira/browse/ARROW-11354 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11354) [Rust] Speed-up casts of dates and times
Jorge Leitão created ARROW-11354: Summary: [Rust] Speed-up casts of dates and times Key: ARROW-11354 URL: https://issues.apache.org/jira/browse/ARROW-11354 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10605) [C++][Gandiva] Support Decimal256 type in gandiva computation.
[ https://issues.apache.org/jira/browse/ARROW-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270546#comment-17270546 ] Micah Kornfield commented on ARROW-10605: - sorry for the delayed reply. [~klykov] this is mostly looking into what operations gandiva currently supports and replicating them for Decimal256 (there are still some basic math/logic operations that aren't supported). Probably a few sub-work items here. > [C++][Gandiva] Support Decimal256 type in gandiva computation. > -- > > Key: ARROW-10605 > URL: https://issues.apache.org/jira/browse/ARROW-10605 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva >Reporter: Micah Kornfield >Priority: Major > > There might be a lot of work here, so sub-jiras might be added once scope is > determined. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11179) [Format] Make comments in fb files friendly to rust doc
[ https://issues.apache.org/jira/browse/ARROW-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270543#comment-17270543 ] Micah Kornfield commented on ARROW-11179: - It is OK with me if you want to open a PR. I don't we rely on the formatting for other languages but I could be wrong. [~uwe] or [~apitrou] might know better. > [Format] Make comments in fb files friendly to rust doc > --- > > Key: ARROW-11179 > URL: https://issues.apache.org/jira/browse/ARROW-11179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Qingyou Meng >Priority: Trivial > Attachments: format-0ed34c83.patch > > > Currently, comments in flatbuffer files are directly copied to rust and c++ > source codes. > That's great but there are some problems with cargo doc: > * array element abc[1] or link label [smith2017knl] causes `broken intra doc > links` warning > * example code/figure blocks are flatten into one line, see example [arrow > 2.0.0 > doc|https://docs.rs/arrow/2.0.0/arrow/ipc/gen/SparseTensor/struct.SparseTensorIndexCSF.html#method.indptrType] > After flatc generating, those ipc files have to be updated manually to fix > the above problems. > So I'm suggesting update flatbuffer comments to address this problem. > * Escape inline code with ` and ` > * Escape text block with ```text and ``` > * change {color:#00875a}[smith2017knl]:{color} > [http://shaden.io/pub-files/smith2017knl.pdf] to > {color:#403294}[smith2017knl]({color}{color:#403294}[http://shaden.io/pub-files/smith2017knl.pdf]){color} > {color:#172b4d}The attachment file *format-0ed34c83.patch*{color} is created > with git command > {code:java} > git diff 0ed34c83 -p format > format-0ed34c83.patch{code} > where *0ed34c83* is this commit: > {noformat} > 0ed34c83c ARROW-9400: [Python] Do not depend on conda-forge static libraries > in Windows wheel builds{noformat} > [~emkornfield] may I create a pull request for this? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11353) [C++][Python][Parquet] We should allow for overriding to large types by providing a schema
Micah Kornfield created ARROW-11353: --- Summary: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema Key: ARROW-11353 URL: https://issues.apache.org/jira/browse/ARROW-11353 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield {{The following shouldn't throw}} {{>>> import pyarrow as pa}} {{>>> import pyarrow.parquet as pq}} {{>>> import pyarrow.dataset as ds}} {{>>> pa.__version__}} {{'2.0.0'}} {{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}} {{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}} {{>>> pq.write_table(table, "/tmp/example.parquet")}} {{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}} {{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}} {{format="parquet").to_table()}} {{Traceback (most recent call last):}} {{ File "", line 1, in }} {{ File "pyarrow/_dataset.pyx", line 405, in}} {{pyarrow._dataset.Dataset.to_table}} {{ File "pyarrow/_dataset.pyx", line 2262, in}} {{pyarrow._dataset.Scanner.to_table}} {{ File "pyarrow/error.pxi", line 122, in}} {{pyarrow.lib.pyarrow_internal_check_status}} {{ File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}} {{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}} {{From: utf8: string To: utf8: large_string}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11066) [Java] Is there a bug in flight AddWritableBuffer
[ https://issues.apache.org/jira/browse/ARROW-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-11066: - Fix Version/s: 4.0.0 > [Java] Is there a bug in flight AddWritableBuffer > - > > Key: ARROW-11066 > URL: https://issues.apache.org/jira/browse/ARROW-11066 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 1.0.0 >Reporter: Kangping Huang >Assignee: David Li >Priority: Major > Fix For: 4.0.0 > > > [https://github.com/apache/arrow/blob/9bab12f03ac486bb8270f031b83f0a0411766b3e/java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/AddWritableBuffer.java#L94] > buf.readBytes(stream, buf.readableBytes()); > is this line redundant > In my perf.svg, this will copy the data from buf to OutputStream, which can > not realize zero-copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11066) [Java] Is there a bug in flight AddWritableBuffer
[ https://issues.apache.org/jira/browse/ARROW-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270500#comment-17270500 ] David Li commented on ARROW-11066: -- Indeed, you seem to be right, and furthermore, that line seems to defeat the optimization the method purports to implement in the first place! The error seems to have been present since the original Flight implementation. I'd surmise it was maybe a bad refactor or half-completed attempt at making {{AddWriteableBuffer#add}} handle the fallback path for you. > [Java] Is there a bug in flight AddWritableBuffer > - > > Key: ARROW-11066 > URL: https://issues.apache.org/jira/browse/ARROW-11066 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 1.0.0 >Reporter: Kangping Huang >Priority: Major > > [https://github.com/apache/arrow/blob/9bab12f03ac486bb8270f031b83f0a0411766b3e/java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/AddWritableBuffer.java#L94] > buf.readBytes(stream, buf.readableBytes()); > is this line redundant > In my perf.svg, this will copy the data from buf to OutputStream, which can > not realize zero-copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11066) [Java] Is there a bug in flight AddWritableBuffer
[ https://issues.apache.org/jira/browse/ARROW-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li reassigned ARROW-11066: Assignee: David Li > [Java] Is there a bug in flight AddWritableBuffer > - > > Key: ARROW-11066 > URL: https://issues.apache.org/jira/browse/ARROW-11066 > Project: Apache Arrow > Issue Type: Bug > Components: FlightRPC, Java >Affects Versions: 1.0.0 >Reporter: Kangping Huang >Assignee: David Li >Priority: Major > > [https://github.com/apache/arrow/blob/9bab12f03ac486bb8270f031b83f0a0411766b3e/java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/AddWritableBuffer.java#L94] > buf.readBytes(stream, buf.readableBytes()); > is this line redundant > In my perf.svg, this will copy the data from buf to OutputStream, which can > not realize zero-copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270480#comment-17270480 ] Brian Hulette commented on ARROW-11347: --- Ah, you mean when accessing a Row, e.g. table.get(0) I _think_ the choice of Map was for code-reuse between Struct vectors and Map vectors ([~paul.e.taylor] wrote this, he could comment more certainly). Note I also added the ability to access the fields in a row view "by attribute" in Python parlance in https://github.com/apache/arrow/pull/2197. So if you have a table with a "foo" field you can access it in a Row view with either table.get(0)["foo"] or table.get(0).foo. I'm pretty sure I actually added that in response to a perf measurement from Jeff back in 2018. > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11352) Implementation status?
Dominik Moritz created ARROW-11352: -- Summary: Implementation status? Key: ARROW-11352 URL: https://issues.apache.org/jira/browse/ARROW-11352 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Dominik Moritz https://arrow.apache.org/docs/status.html says that the Rust implementation doesn't support anything except CSV R/W. Is that true? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11351) Reconsider proxy objects instead of defineProperty
[ https://issues.apache.org/jira/browse/ARROW-11351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Moritz updated ARROW-11351: --- Description: I was wondering why Arrow uses Proxy objects instead of defineProperty, which was a bit faster in the experiments at https://observablehq.com/@jheer/from-apache-arrow-to-javascript-objects. I don't know whether a change makes sense but I would love to know the design rationale since I couldn't find anything in the issues or on GitHub about it. (was: Related to https://issues.apache.org/jira/browse/ARROW-11347 I was wondering why Arrow uses Proxy objects instead of defineProperty, which was a bit faster in the experiments at https://observablehq.com/@jheer/from-apache-arrow-to-javascript-objects. I don't know whether a change makes sense but I would love to know the design rationale since I couldn't find anything in the issues or on GitHub about it. ) > Reconsider proxy objects instead of defineProperty > -- > > Key: ARROW-11351 > URL: https://issues.apache.org/jira/browse/ARROW-11351 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > > I was wondering why Arrow uses Proxy objects instead of defineProperty, which > was a bit faster in the experiments at > https://observablehq.com/@jheer/from-apache-arrow-to-javascript-objects. I > don't know whether a change makes sense but I would love to know the design > rationale since I couldn't find anything in the issues or on GitHub about it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11351) Reconsider proxy objects instead of defineProperty
Dominik Moritz created ARROW-11351: -- Summary: Reconsider proxy objects instead of defineProperty Key: ARROW-11351 URL: https://issues.apache.org/jira/browse/ARROW-11351 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Dominik Moritz Related to https://issues.apache.org/jira/browse/ARROW-11347 I was wondering why Arrow uses Proxy objects instead of defineProperty, which was a bit faster in the experiments at https://observablehq.com/@jheer/from-apache-arrow-to-javascript-objects. I don't know whether a change makes sense but I would love to know the design rationale since I couldn't find anything in the issues or on GitHub about it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270450#comment-17270450 ] Dominik Moritz commented on ARROW-11347: Yes, when accessing an element of an array (e.g. after `toArray()`). Before making the change, someone needs to look closer into the performance benefits and usability. Jeff created his own parser for Arquero, which can make some simplifying assumptions and is less general but also almost twice as fast (https://github.com/uwdata/arquero-arrow/tree/main/perf). It would be good to figure out why. I think Maps are generally nicer than Objects for users so maybe it's worth the performance difference. It would be great if you could share how you decided on Maps in the first place. > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11350) [C++] Bump dependency versions
[ https://issues.apache.org/jira/browse/ARROW-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11350: --- Labels: pull-request-available (was: ) > [C++] Bump dependency versions > -- > > Key: ARROW-11350 > URL: https://issues.apache.org/jira/browse/ARROW-11350 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11350) [C++] Bump dependency versions
Neal Richardson created ARROW-11350: --- Summary: [C++] Bump dependency versions Key: ARROW-11350 URL: https://issues.apache.org/jira/browse/ARROW-11350 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11075) [Python] Getting reference not found with OCR enabled pyarrow
[ https://issues.apache.org/jira/browse/ARROW-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270419#comment-17270419 ] Uwe Korn commented on ARROW-11075: -- The latest ORC release is supporting shared linkage and the conda toolchain has been reworked to link dynamically: https://github.com/conda-forge/arrow-cpp-feedstock/blob/1.0.x/recipe/meta.yaml. The major issue here is probably that ORC 0.6.2 is built as part of the Arrow thirdparty toolchain but 0.6.6 headers are used during the build. Not sure how this links but that feels like the most likely issue to me. > [Python] Getting reference not found with OCR enabled pyarrow > - > > Key: ARROW-11075 > URL: https://issues.apache.org/jira/browse/ARROW-11075 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.1 > Environment: PPC64LE >Reporter: Kandarpa >Priority: Major > Attachments: arrow_cpp_build.log, arrow_python_build.log, > conda_list.txt > > > Generated the pyarrow with OCR enabled on Power using following steps: > {code:java} > export ARROW_HOME=$CONDA_PREFIX > mkdir cpp/build > cd cpp/build > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > -DCMAKE_INSTALL_LIBDIR=lib \ > -DARROW_WITH_BZ2=ON \ > -DARROW_WITH_ZLIB=ON \ > -DARROW_WITH_ZSTD=ON \ > -DARROW_WITH_LZ4=ON \ > -DARROW_WITH_SNAPPY=ON \ > -DARROW_WITH_BROTLI=ON \ > -DARROW_PARQUET=ON \ > -DARROW_PYTHON=ON \ > -DARROW_BUILD_TESTS=ON \ > -DARROW_CUDA=ON \ > -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so \ > -DARROW_ORC=ON \ > .. > make -j > make install > cd ../../python > python setup.py build_ext --bundle-arrow-cpp --with-orc --with-cuda > --with-parquet bdist_wheel > {code} > > > With the generated whl package installed, ran CUDF tests and observed > following error: > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > Please find the whole error log below: > > ERRORS > > ERROR > collecting test session > _ > /conda/envs/rmm/lib/python3.7/importlib/__init__.py:127: in import_module > return _bootstrap._gcd_import(name[level:], package, level) > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :967: in _find_and_load_unlocked > ??? > :677: in _load_unlocked > ??? > :728: in exec_module > ??? > :219: in _call_with_frames_removed > ??? > cudf/cudf/__init__.py:60: in > from cudf.io import ( > cudf/cudf/io/__init__.py:8: in > from cudf.io.orc import read_orc, read_orc_metadata, to_orc > cudf/cudf/io/orc.py:6: in > from pyarrow import orc as orc > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/orc.py:24: in > import pyarrow._orc as _orc > {color:#de350b}E ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: > _ZN5arrow8adapters3orc13ORCFileReader4ReadEPSt10shared_ptrINS_5TableEE{color} > === > short test summary info > > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > > Interrupted: 1 error during collection > > === > 1 error in 1.54s > === > Fatal Python error: Segmentation fault -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11075) [Python] Getting reference not found with OCR enabled pyarrow
[ https://issues.apache.org/jira/browse/ARROW-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270414#comment-17270414 ] Wes McKinney commented on ARROW-11075: -- ORC is supported to be statically linked, so this would be unusual. [~kandarpamalipeddi] can you show what ORC symbols are in your shared library? {code} nm -D /path/to/libarrow.so | c++filt | grep orc {code} Check also which libarrow.so the pyarrow libraries are linking to if you can (with {{ldd}}) > [Python] Getting reference not found with OCR enabled pyarrow > - > > Key: ARROW-11075 > URL: https://issues.apache.org/jira/browse/ARROW-11075 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.1 > Environment: PPC64LE >Reporter: Kandarpa >Priority: Major > Attachments: arrow_cpp_build.log, arrow_python_build.log, > conda_list.txt > > > Generated the pyarrow with OCR enabled on Power using following steps: > {code:java} > export ARROW_HOME=$CONDA_PREFIX > mkdir cpp/build > cd cpp/build > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > -DCMAKE_INSTALL_LIBDIR=lib \ > -DARROW_WITH_BZ2=ON \ > -DARROW_WITH_ZLIB=ON \ > -DARROW_WITH_ZSTD=ON \ > -DARROW_WITH_LZ4=ON \ > -DARROW_WITH_SNAPPY=ON \ > -DARROW_WITH_BROTLI=ON \ > -DARROW_PARQUET=ON \ > -DARROW_PYTHON=ON \ > -DARROW_BUILD_TESTS=ON \ > -DARROW_CUDA=ON \ > -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so \ > -DARROW_ORC=ON \ > .. > make -j > make install > cd ../../python > python setup.py build_ext --bundle-arrow-cpp --with-orc --with-cuda > --with-parquet bdist_wheel > {code} > > > With the generated whl package installed, ran CUDF tests and observed > following error: > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > Please find the whole error log below: > > ERRORS > > ERROR > collecting test session > _ > /conda/envs/rmm/lib/python3.7/importlib/__init__.py:127: in import_module > return _bootstrap._gcd_import(name[level:], package, level) > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :967: in _find_and_load_unlocked > ??? > :677: in _load_unlocked > ??? > :728: in exec_module > ??? > :219: in _call_with_frames_removed > ??? > cudf/cudf/__init__.py:60: in > from cudf.io import ( > cudf/cudf/io/__init__.py:8: in > from cudf.io.orc import read_orc, read_orc_metadata, to_orc > cudf/cudf/io/orc.py:6: in > from pyarrow import orc as orc > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/orc.py:24: in > import pyarrow._orc as _orc > {color:#de350b}E ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: > _ZN5arrow8adapters3orc13ORCFileReader4ReadEPSt10shared_ptrINS_5TableEE{color} > === > short test summary info > > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > > Interrupted: 1 error during collection > > === > 1 error in 1.54s > === > Fatal Python error: Segmentation fault -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11075) [Python] Getting reference not found with OCR enabled pyarrow
[ https://issues.apache.org/jira/browse/ARROW-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270411#comment-17270411 ] Uwe Korn commented on ARROW-11075: -- I would guess that the issue is related to {{-DORC_SOURCE=BUNDLED}} and having {{orc}} installed as a conda package at the same time. Can you remove the {{-DORC_SOURCE=BUNDLED}} flag and do a clean build? Do you know why you have set that? > [Python] Getting reference not found with OCR enabled pyarrow > - > > Key: ARROW-11075 > URL: https://issues.apache.org/jira/browse/ARROW-11075 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.1 > Environment: PPC64LE >Reporter: Kandarpa >Priority: Major > Attachments: arrow_cpp_build.log, arrow_python_build.log, > conda_list.txt > > > Generated the pyarrow with OCR enabled on Power using following steps: > {code:java} > export ARROW_HOME=$CONDA_PREFIX > mkdir cpp/build > cd cpp/build > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > -DCMAKE_INSTALL_LIBDIR=lib \ > -DARROW_WITH_BZ2=ON \ > -DARROW_WITH_ZLIB=ON \ > -DARROW_WITH_ZSTD=ON \ > -DARROW_WITH_LZ4=ON \ > -DARROW_WITH_SNAPPY=ON \ > -DARROW_WITH_BROTLI=ON \ > -DARROW_PARQUET=ON \ > -DARROW_PYTHON=ON \ > -DARROW_BUILD_TESTS=ON \ > -DARROW_CUDA=ON \ > -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so \ > -DARROW_ORC=ON \ > .. > make -j > make install > cd ../../python > python setup.py build_ext --bundle-arrow-cpp --with-orc --with-cuda > --with-parquet bdist_wheel > {code} > > > With the generated whl package installed, ran CUDF tests and observed > following error: > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > Please find the whole error log below: > > ERRORS > > ERROR > collecting test session > _ > /conda/envs/rmm/lib/python3.7/importlib/__init__.py:127: in import_module > return _bootstrap._gcd_import(name[level:], package, level) > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :953: in _find_and_load_unlocked > ??? > :219: in _call_with_frames_removed > ??? > :1006: in _gcd_import > ??? > :983: in _find_and_load > ??? > :967: in _find_and_load_unlocked > ??? > :677: in _load_unlocked > ??? > :728: in exec_module > ??? > :219: in _call_with_frames_removed > ??? > cudf/cudf/__init__.py:60: in > from cudf.io import ( > cudf/cudf/io/__init__.py:8: in > from cudf.io.orc import read_orc, read_orc_metadata, to_orc > cudf/cudf/io/orc.py:6: in > from pyarrow import orc as orc > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/orc.py:24: in > import pyarrow._orc as _orc > {color:#de350b}E ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: > _ZN5arrow8adapters3orc13ORCFileReader4ReadEPSt10shared_ptrINS_5TableEE{color} > === > short test summary info > > *_ERROR cudf - ImportError: > /conda/envs/rmm/lib/python3.7/site-packages/pyarrow/_orc.cpython-37m-powerpc64le-linux-gnu.so: > undefined symbol: _ZN5arrow8adapters3orc13OR..._* > > Interrupted: 1 error during collection > > === > 1 error in 1.54s > === > Fatal Python error: Segmentation fault -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11299) [Python] build warning in python
[ https://issues.apache.org/jira/browse/ARROW-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-11299. -- Fix Version/s: (was: 4.0.0) 3.0.0 Resolution: Fixed Issue resolved by pull request 9274 [https://github.com/apache/arrow/pull/9274] > [Python] build warning in python > > > Key: ARROW-11299 > URL: https://issues.apache.org/jira/browse/ARROW-11299 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 2.0.0 >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Many warnings about compute kernel options when building Arrow python. > Removing below line suppresses the warnings. > https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 > I think the reason is virtual destructor makes the structure non C compatible > and cannot use offsetof macro safely. > As function options are straightforward, looks destructor is not necessary. > [~bkietz] > *Steps to reproduce* > build arrow cpp > {code:bash} > ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release > -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON > -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib > -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 > -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. > ~/arrow/cpp/release $ ninja install > {code} > build arrow python > {code:bash} > ~/arrow/python $ python --version > Python 3.6.9 > ~/arrow/python $ python setup.py build_ext --inplace > .. > [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] > In file included from > /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, > from /usr/include/signal.h:303, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, > from > /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, > from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function > ‘int __Pyx_modinit_type_init_code()’: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__FilterOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__MinMaxOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] > _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CountOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_ba
[jira] [Updated] (ARROW-8919) [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke
[ https://issues.apache.org/jira/browse/ARROW-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8919: -- Labels: compute pull-request-available (was: compute) > [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that > may require implicit casts to invoke > -- > > Key: ARROW-8919 > URL: https://issues.apache.org/jira/browse/ARROW-8919 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 >Reporter: Wes McKinney >Assignee: Ben Kietzman >Priority: Major > Labels: compute, pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently we have "DispatchExact" which requires an exact match of input > types. "DispatchBest" would permit kernel selection with implicit casts > required. Since multiple kernels may be valid when allowing implicit casts, > we will need to break ties by estimating the "cost" of the implicit casts. > For example, casting int8 to int32 is "less expensive" than implicitly > casting to int64 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11349) [Rust] Add from_iter_values to create arrays from T instead of Option
[ https://issues.apache.org/jira/browse/ARROW-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11349: --- Labels: pull-request-available (was: ) > [Rust] Add from_iter_values to create arrays from T instead of Option > > > Key: ARROW-11349 > URL: https://issues.apache.org/jira/browse/ARROW-11349 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Daniël Heres >Assignee: Daniël Heres >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In that case we don't have to allocate a null buffer / set bits, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11349) [Rust] Add from_iter_values to create arrays from T instead of Option
Daniël Heres created ARROW-11349: Summary: [Rust] Add from_iter_values to create arrays from T instead of Option Key: ARROW-11349 URL: https://issues.apache.org/jira/browse/ARROW-11349 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Daniël Heres Assignee: Daniël Heres In that case we don't have to allocate a null buffer / set bits, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11348) [C++] Add pretty printing support for gdb
[ https://issues.apache.org/jira/browse/ARROW-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-11348: Description: Parsing the GDB output is error prone and can take considerable time. Also, some information is difficult or non-intuitive to get to (e.g. the name of a data type). We should add [GDB pretty printers|https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing-API.html#Pretty-Printing-API] to improve the debug workflow for developers. This could assist not just Arrow developers but also developers using the Arrow C++ libs. (was: Parsing the GDB output is error prone and can take considerable time. Also, some information is difficult or non-intuitive to get to (e.g. the name of a data type). We should add GDB pretty printers[1] to improve the debug workflow for developers. This could assist not just Arrow developers but also developers using the Arrow C++ libs.) > [C++] Add pretty printing support for gdb > - > > Key: ARROW-11348 > URL: https://issues.apache.org/jira/browse/ARROW-11348 > Project: Apache Arrow > Issue Type: Wish >Reporter: Weston Pace >Priority: Major > > Parsing the GDB output is error prone and can take considerable time. Also, > some information is difficult or non-intuitive to get to (e.g. the name of a > data type). We should add [GDB pretty > printers|https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing-API.html#Pretty-Printing-API] > to improve the debug workflow for developers. This could assist not just > Arrow developers but also developers using the Arrow C++ libs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-11348) [C++] Add pretty printing support for gdb
[ https://issues.apache.org/jira/browse/ARROW-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270269#comment-17270269 ] Weston Pace edited comment on ARROW-11348 at 1/22/21, 4:52 PM: --- I've made a first pass at this which improves things considerably. I will keep improving upon this and adding new information / features as I debug and hopefully these scripts will be robust enough to merge at some point. If anyone is interested in helping develop with these they are located here: [https://github.com/westonpace/arrow/tree/feature/gdb-pretty-printers] To use the pretty printers you will need something like this in your .gdbinit {code:java} python from pathlib import Pathdef load_file(gdb_dir, filename): fullpath = str(gdb_dir / filename) print(f'Activating pretty printer {fullpath}') gdb.execute(f'source {fullpath}')dir_ = Path('.').absolute() while True: gdb_dir = dir_ / 'dev' / 'gdb' if gdb_dir.exists(): print(f'Activating pretty printers found at {gdb_dir}') load_file(gdb_dir, 'find_stl.py') load_file(gdb_dir, 'pretty_printers.py') load_file(gdb_dir, 'commands.py') break if dir_ == Path('/'): print(f'Could not locate pretty printers') break dir_ = dir_.parent end {code} This script will find the printers as long as you are in the arrow directory or a subdirectory when you run gdb. There is also a utility to try and find the STL pretty printers. These are found using conda so you will need to be in a conda environment with the gxx_linux-64 package installed to find them. There is also a utility command `parr` which takes an "expression" and will attempt to use one of the arrow pretty print utilities to print the result of the expression. Example commands: {code:java} p *by.data_ p (*(by.data())).child_data p *((*(by.data())).child_data[0]) p (*((*(by.data())).child_data[0])).buffers p *((*((*(by.data())).child_data[0])).buffers[1]) p *((*((*(by.data())).child_data[0])).buffers[2]) parr by {code} Output with pretty printers: {code:java} (gdb) $1 = ArrayData (type=DT("struct") length=8 offset=0 buffers=0x55715f68 child_data=0x55715f80) (gdb) $2 = std::vector of length 2, capacity 2 = {std::shared_ptr (use count 2, weak count 0) = {get() = 0x55713ff0}, std::shared_ptr (use count 2, weak count 0) = { get() = 0x55714070}} (gdb) $3 = ArrayData (type=DT("string") length=8 offset=0 buffers=0x55714018 child_data=0x55714030) (gdb) $4 = std::vector of length 3, capacity 3 = {std::shared_ptr (empty) = {get() = 0x0}, std::shared_ptr (use count 1, weak count 0) = {get() = 0x556a5b00}, std::shared_ptr (use count 1, weak count 0) = {get() = 0x556eee30}} (gdb) $5 = Buffer (size=36 capacity=64 data_addr=0x74209400 "") = {x00, x00, x00, x00, x02, x00, x00, x00, x04, x00, x00, x00, x07, x00, x00, x00, x09, x00, x00, x00, x0c, x00, x00, x00, x0e, x00, x00, x00, x10, x00, x00, x00, x13, x00, x00, x00} (gdb) $6 = Buffer (size=19 capacity=64 data_addr=0x74209080 "exexwhyexwhyexexwhy") = {x65, x78, x65, x78, x77, x68, x79, x65, x78, x77, x68, x79, x65, x78, x65, x78, x77, x68, x79} (gdb) -- is_valid: all not null -- child 0 type: string [ "ex", "ex", "why", "ex", "why", "ex", "ex", "why" ] -- child 1 type: int32 [ 0, 0, 0, 1, 0, 1, 0, 1 ] {code} Output without pretty printers: {code:java} (gdb) $1 = (std::__shared_ptr_access::element_type &) @0x55715f10: { type = {> = {> = {}, _M_ptr = 0x556eee70, _M_refcount = {_M_pi = 0x556eee60}}, }, length = 8, null_count = {> = {static _S_alignment = 8, _M_i = 0}, }, offset = 0, buffers = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x557150b0, _M_finish = 0x557150c0, _M_end_of_storage = 0x557150c0}, }}, }, child_data = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x55714580, _M_finish = 0x557145a0, _M_end_of_storage = 0x557145a0}, }}, }, dictionary = {> = {> = {}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, }} (gdb) $2 = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x55714580, _M_finish = 0x557145a0, _M_end_of_storage = 0x557145a0}, }}, } (gdb) $3 = (std::__shared_ptr_access::element_type &) @0x55713fc0: { type = {> = {> = {}, _M_ptr = 0x556a20f0, _M_refcount = {_M_pi = 0x556a20e0}}, }, length = 8, null_count = {> = {static _S_alignment = 8, _M_i = 0}, }, offset = 0, buffers = {
[jira] [Commented] (ARROW-11348) [C++] Add pretty printing support for gdb
[ https://issues.apache.org/jira/browse/ARROW-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270269#comment-17270269 ] Weston Pace commented on ARROW-11348: - gxx_linux-64I've made a first pass at this which improves things considerably. I will keep improving upon this and adding new information / features as I debug and hopefully these scripts will be robust enough to merge at some point. If anyone is interested in helping develop with these they are located here: [https://github.com/westonpace/arrow/tree/feature/gdb-pretty-printers] To use the pretty printers you will need something like this in your .gdbinit {code:java} python from pathlib import Pathdef load_file(gdb_dir, filename): fullpath = str(gdb_dir / filename) print(f'Activating pretty printer {fullpath}') gdb.execute(f'source {fullpath}')dir_ = Path('.').absolute() while True: gdb_dir = dir_ / 'dev' / 'gdb' if gdb_dir.exists(): print(f'Activating pretty printers found at {gdb_dir}') load_file(gdb_dir, 'find_stl.py') load_file(gdb_dir, 'pretty_printers.py') load_file(gdb_dir, 'commands.py') break if dir_ == Path('/'): print(f'Could not locate pretty printers') break dir_ = dir_.parent end {code} This script will find the printers as long as you are in the arrow directory or a subdirectory when you run gdb. There is also a utility to try and find the STL pretty printers. These are found using conda so you will need to be in a conda environment with the gxx_linux-64 package installed to find them. There is also a utility command `parr` which takes an "expression" and will attempt to use one of the arrow pretty print utilities to print the result of the expression. Example commands: {code:java} p *by.data_ p (*(by.data())).child_data p *((*(by.data())).child_data[0]) p (*((*(by.data())).child_data[0])).buffers p *((*((*(by.data())).child_data[0])).buffers[1]) p *((*((*(by.data())).child_data[0])).buffers[2]) parr by {code} Output with pretty printers: {code:java} (gdb) $1 = ArrayData (type=DT("struct") length=8 offset=0 buffers=0x55715f68 child_data=0x55715f80) (gdb) $2 = std::vector of length 2, capacity 2 = {std::shared_ptr (use count 2, weak count 0) = {get() = 0x55713ff0}, std::shared_ptr (use count 2, weak count 0) = { get() = 0x55714070}} (gdb) $3 = ArrayData (type=DT("string") length=8 offset=0 buffers=0x55714018 child_data=0x55714030) (gdb) $4 = std::vector of length 3, capacity 3 = {std::shared_ptr (empty) = {get() = 0x0}, std::shared_ptr (use count 1, weak count 0) = {get() = 0x556a5b00}, std::shared_ptr (use count 1, weak count 0) = {get() = 0x556eee30}} (gdb) $5 = Buffer (size=36 capacity=64 data_addr=0x74209400 "") = {x00, x00, x00, x00, x02, x00, x00, x00, x04, x00, x00, x00, x07, x00, x00, x00, x09, x00, x00, x00, x0c, x00, x00, x00, x0e, x00, x00, x00, x10, x00, x00, x00, x13, x00, x00, x00} (gdb) $6 = Buffer (size=19 capacity=64 data_addr=0x74209080 "exexwhyexwhyexexwhy") = {x65, x78, x65, x78, x77, x68, x79, x65, x78, x77, x68, x79, x65, x78, x65, x78, x77, x68, x79} (gdb) -- is_valid: all not null -- child 0 type: string [ "ex", "ex", "why", "ex", "why", "ex", "ex", "why" ] -- child 1 type: int32 [ 0, 0, 0, 1, 0, 1, 0, 1 ] {code} Output without pretty printers: {code:java} (gdb) $1 = (std::__shared_ptr_access::element_type &) @0x55715f10: { type = {> = {> = {}, _M_ptr = 0x556eee70, _M_refcount = {_M_pi = 0x556eee60}}, }, length = 8, null_count = {> = {static _S_alignment = 8, _M_i = 0}, }, offset = 0, buffers = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x557150b0, _M_finish = 0x557150c0, _M_end_of_storage = 0x557150c0}, }}, }, child_data = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x55714580, _M_finish = 0x557145a0, _M_end_of_storage = 0x557145a0}, }}, }, dictionary = {> = {> = {}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, }} (gdb) $2 = {, std::allocator > >> = { _M_impl = { >> = {<__gnu_cxx::new_allocator >> = {}, }, , std::allocator > >::_Vector_impl_data> = {_M_start = 0x55714580, _M_finish = 0x557145a0, _M_end_of_storage = 0x557145a0}, }}, } (gdb) $3 = (std::__shared_ptr_access::element_type &) @0x55713fc0: { type = {> = {> = {}, _M_ptr = 0x556a20f0, _M_refcount = {_M_pi = 0x556a20e0}}, }, length = 8, null_count = {> = {static _S_alignment = 8, _M_i = 0}, }, offset = 0, buffers = {, std::allocator > >> = { _M_imp
[jira] [Created] (ARROW-11348) [C++] Add pretty printing support for gdb
Weston Pace created ARROW-11348: --- Summary: [C++] Add pretty printing support for gdb Key: ARROW-11348 URL: https://issues.apache.org/jira/browse/ARROW-11348 Project: Apache Arrow Issue Type: Wish Reporter: Weston Pace Parsing the GDB output is error prone and can take considerable time. Also, some information is difficult or non-intuitive to get to (e.g. the name of a data type). We should add GDB pretty printers[1] to improve the debug workflow for developers. This could assist not just Arrow developers but also developers using the Arrow C++ libs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270257#comment-17270257 ] Brian Hulette commented on ARROW-11347: --- Can you clarify where is it that we use Maps that you think should change? Is it when accessing an element of a Map-typed array? I'd be open to changing it but we'd need to consider that this would be a breaking API change. I suppose this is technically OK since all releases are major but it may be inconvenient for users. > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11344) [Python] Data of struct fields are our-of-order in parquet files created by the write_table() method
[ https://issues.apache.org/jira/browse/ARROW-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270222#comment-17270222 ] Weston Pace commented on ARROW-11344: - Thank you for creating such a detailed test case. I have run your test against pyarrow 2.0.0 and I can confirm I get the same results that you do. Luckily, when I ran your test against the latest code I did not see this error and I confirmed that the full_name.name column aligned with the fruit_name column. We have recently fixed issues related to structs such as ARROW-10493 and my assumption is that you encountered one of those. We are on the verge of releasing 3.0.0. There is an RC available at ([https://bintray.com/apache/arrow/python-rc/3.0.0-rc2#files/python-rc/3.0.0-rc2)] if you would like to test this behavior out yourself sooner. > [Python] Data of struct fields are our-of-order in parquet files created by > the write_table() method > > > Key: ARROW-11344 > URL: https://issues.apache.org/jira/browse/ARROW-11344 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 2.0.0 >Reporter: Chen Ming >Priority: Major > Attachments: test_struct.csv, test_struct_200.parquet, > test_struct_200.py, test_struct_200_flat.parquet, test_struct_200_flat.py > > > Hi, > We found an our-of-order issue with the 'struct' data type recently, would > like to know if you can help to root cause it. > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.read_csv('./test_struct.csv') > print(df.dtypes) > df['full_name'] = df.apply(lambda x: {"package": x['file_package'], "name": > x["file_name"]}, axis=1) > my_df = df.drop(['file_package', 'file_name'], axis=1) > file_fields = [('package', pa.string()), ('name', pa.string()),] > my_schema = pa.schema([pa.field('full_name', pa.struct(file_fields)), >pa.field('fruit_name', pa.string())]) > my_table = pa.Table.from_pandas(my_df, schema = my_schema) > print('Table schema:') > print(my_table.schema) > pq.write_table(my_table, './test_struct_200.parquet') > {code} > The above code (attached as test_struct_200.py) runs with the following > python packages: > {code:java} > Pandas Version = 1.1.3 > PyArrow Version = 2.0.0 > {code} > Then I use parquet-tools (1.11.1) to read the file, but get the following > output: > {code:java} > $ java -jar parquet-tools-1.11.1.jar head -n 2181 test_struct_200.parquet > ... > full_name: > .package = fruit.zip > .name = apple.csv > fruit_name = strawberry > full_name: > .package = fruit.zip > .name = apple.csv > fruit_name = strawberry > full_name: > .package = fruit.zip > .name = apple.csv > fruit_name = strawberry > {code} > (BTW, you can also view the parquet file with > [http://parquet-viewer-online.com/]) > The output is supposed to be (refer to test_struct.csv) : > {code:java} > $ java -jar parquet-tools-1.11.1.jar head -n 2181 test_struct_200.parquet > ... > full_name: > .package = fruit.zip > .name = strawberry.csv > fruit_name = strawberry > full_name: > .package = fruit.zip > .name = strawberry.csv > fruit_name = strawberry > full_name: > .package = fruit.zip > .name = strawberry.csv > fruit_name = strawberry > {code} > As a comparison, the following code (attached as test_struct_200_flat.py) > would generate a parquet file with the same data of test_struct.csv: > {code:java} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.read_csv('./test_struct.csv') > print(df.dtypes) > my_schema = pa.schema([pa.field('file_package', pa.string()), >pa.field('file_name', pa.string()), >pa.field('fruit_name', pa.string())]) > my_table = pa.Table.from_pandas(df, schema = my_schema) > print('Table schema:') > print(my_table.schema) > pq.write_table(my_table, './test_struct_200_flat.parquet') > {code} > I also attached the two parquet files for your references. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11332) [Rust] Use MutableBuffer in take_string instead of Vec
[ https://issues.apache.org/jira/browse/ARROW-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11332. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9279 [https://github.com/apache/arrow/pull/9279] > [Rust] Use MutableBuffer in take_string instead of Vec > -- > > Key: ARROW-11332 > URL: https://issues.apache.org/jira/browse/ARROW-11332 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Daniël Heres >Assignee: Daniël Heres >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270206#comment-17270206 ] Dominik Moritz commented on ARROW-11347: I wonder what [~bhulette] and [~paultaylor] say about this since they originally decided to go with Map. > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270182#comment-17270182 ] Neville Dipale edited comment on ARROW-11347 at 1/22/21, 2:56 PM: -- Hi [~domoritz] The performance difference looks solid. I've tried that notebook on Chrome vs Safari (Macbook Air M1). Object: ~700ms vs ~2'600ms Map: ~5'300ms vs ~4'800ms On Chrome vs Firefox (Ryzen desktop) Object: ~700ms vs ~600ms Map: ~3'800ms vs ~ 11'600ms Do you think that there'd be a downside to using Object, in the ergonomics of the APIs? I haven't used the JS implementation enough to have an opinion, hence I'm asking. If you can open a PR with the change, we can review it and get it merged. Thanks was (Author: nevi_me): Hi [~domoritz] The performance difference looks solid. Do you think that there'd be a downside to using Object, in the ergonomics of the APIs? I haven't used the JS implementation enough to have an opinion, hence I'm asking. If you can open a PR with the change, we can review it and get it merged. Thanks > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270182#comment-17270182 ] Neville Dipale commented on ARROW-11347: Hi [~domoritz] The performance difference looks solid. Do you think that there'd be a downside to using Object, in the ergonomics of the APIs? I haven't used the JS implementation enough to have an opinion, hence I'm asking. If you can open a PR with the change, we can review it and get it merged. Thanks > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11347) [JavaScript] Consider Objects instead of Maps
[ https://issues.apache.org/jira/browse/ARROW-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-11347: --- Summary: [JavaScript] Consider Objects instead of Maps (was: Consider Objects instead of Maps) > [JavaScript] Consider Objects instead of Maps > - > > Key: ARROW-11347 > URL: https://issues.apache.org/jira/browse/ARROW-11347 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Labels: performance > Original Estimate: 24h > Remaining Estimate: 24h > > A quick experiment > (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to > show that object accesses are a lot faster than map accesses. Would it make > sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11299) [Python] build warning in python
[ https://issues.apache.org/jira/browse/ARROW-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman updated ARROW-11299: - Affects Version/s: 2.0.0 > [Python] build warning in python > > > Key: ARROW-11299 > URL: https://issues.apache.org/jira/browse/ARROW-11299 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 2.0.0 >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Many warnings about compute kernel options when building Arrow python. > Removing below line suppresses the warnings. > https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 > I think the reason is virtual destructor makes the structure non C compatible > and cannot use offsetof macro safely. > As function options are straightforward, looks destructor is not necessary. > [~bkietz] > *Steps to reproduce* > build arrow cpp > {code:bash} > ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release > -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON > -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib > -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 > -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. > ~/arrow/cpp/release $ ninja install > {code} > build arrow python > {code:bash} > ~/arrow/python $ python --version > Python 3.6.9 > ~/arrow/python $ python setup.py build_ext --inplace > .. > [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] > In file included from > /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, > from /usr/include/signal.h:303, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, > from > /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, > from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function > ‘int __Pyx_modinit_type_init_code()’: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__FilterOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__MinMaxOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] > _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CountOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26210:154: > warning: offsetof within non-standard-lay
[jira] [Assigned] (ARROW-11299) [Python] build warning in python
[ https://issues.apache.org/jira/browse/ARROW-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman reassigned ARROW-11299: Assignee: Yibo Cai > [Python] build warning in python > > > Key: ARROW-11299 > URL: https://issues.apache.org/jira/browse/ARROW-11299 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Many warnings about compute kernel options when building Arrow python. > Removing below line suppresses the warnings. > https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 > I think the reason is virtual destructor makes the structure non C compatible > and cannot use offsetof macro safely. > As function options are straightforward, looks destructor is not necessary. > [~bkietz] > *Steps to reproduce* > build arrow cpp > {code:bash} > ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release > -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON > -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib > -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 > -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. > ~/arrow/cpp/release $ ninja install > {code} > build arrow python > {code:bash} > ~/arrow/python $ python --version > Python 3.6.9 > ~/arrow/python $ python setup.py build_ext --inplace > .. > [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] > In file included from > /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, > from /usr/include/signal.h:303, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, > from > /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, > from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function > ‘int __Pyx_modinit_type_init_code()’: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__FilterOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__MinMaxOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] > _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CountOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26210:154: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__VarianceOptions’
[jira] [Updated] (ARROW-11299) [Python] build warning in python
[ https://issues.apache.org/jira/browse/ARROW-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman updated ARROW-11299: - Fix Version/s: 4.0.0 > [Python] build warning in python > > > Key: ARROW-11299 > URL: https://issues.apache.org/jira/browse/ARROW-11299 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Many warnings about compute kernel options when building Arrow python. > Removing below line suppresses the warnings. > https://github.com/apache/arrow/blob/140135908c5d131ceac31a0e529f9b9b763b1106/cpp/src/arrow/compute/function.h#L45 > I think the reason is virtual destructor makes the structure non C compatible > and cannot use offsetof macro safely. > As function options are straightforward, looks destructor is not necessary. > [~bkietz] > *Steps to reproduce* > build arrow cpp > {code:bash} > ~/arrow/cpp/release $ cmake -GNinja -DCMAKE_BUILD_TYPE=Release > -DARROW_COMPUTE=ON -DARROW_BUILD_TESTS=ON > -DCMAKE_INSTALL_PREFIX=$(pwd)/_install -DCMAKE_INSTALL_LIBDIR=lib > -DARROW_PYTHON=ON -DCMAKE_CXX_COMPILER=/usr/bin/clang++-9 > -DCMAKE_C_COMPILER=/usr/bin/clang-9 .. > ~/arrow/cpp/release $ ninja install > {code} > build arrow python > {code:bash} > ~/arrow/python $ python --version > Python 3.6.9 > ~/arrow/python $ python setup.py build_ext --inplace > .. > [ 93%] Building CXX object CMakeFiles/_compute.dir/_compute.cpp.o [27/1691] > In file included from > /usr/include/x86_64-linux-gnu/bits/types/stack_t.h:23:0, > from /usr/include/signal.h:303, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/npy_interrupt.h:84, > from > /home/cyb/archery/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:5, > from > /home/cyb/arrow/cpp/release/_install/include/arrow/python/numpy_interop.h:41, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/helpers.h:27, > from /home/cyb/arrow/cpp/release/_install/include/arrow/python/api.h:24, > from /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:696: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp: In function > ‘int __Pyx_modinit_type_init_code()’: > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26034:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CastOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__CastOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CastOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26066:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__FilterOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__FilterOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__FilterOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26082:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__TakeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__TakeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__TakeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26130:150: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__MinMaxOptions’ is undefined > [-Winvalid-offsetof] > type_7pyarrow_8_compute__MinMaxOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__MinMaxOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26146:148: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__CountOptions’ is undefined [-Winvalid-offsetof] > _type_7pyarrow_8_compute__CountOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__CountOptions, > __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26162:146: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8_compute__ModeOptions’ is undefined [-Winvalid-offsetof] > x_type_7pyarrow_8_compute__ModeOptions.tp_weaklistoffset = offsetof(struct > __pyx_obj_7pyarrow_8_compute__ModeOptions, __pyx_base.__pyx_base.__weakref__); > ^ > /home/cyb/arrow/python/build/temp.linux-x86_64-3.6/_compute.cpp:26210:154: > warning: offsetof within non-standard-layout type > ‘__pyx_obj_7pyarrow_8
[jira] [Comment Edited] (ARROW-9745) [Python] Reading Parquet file crashes on windows - python3.8
[ https://issues.apache.org/jira/browse/ARROW-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270118#comment-17270118 ] Maximilian Speicher edited comment on ARROW-9745 at 1/22/21, 1:13 PM: -- For me the same error persists even after doing a clean reinstall of Python and recreating the venv. It somehow seems to be related to snappy compression, as it works fine when using gzip as the compression. *Update:* Running the same code on the same machine inside of WSL works just fine. was (Author: mspeicher): For me the same error persists even after doing a clean reinstall of Python and recreating the venv. It somehow seems to be related to snappy compression, as it works fine when using gzip as the compression. > [Python] Reading Parquet file crashes on windows - python3.8 > > > Key: ARROW-9745 > URL: https://issues.apache.org/jira/browse/ARROW-9745 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.0 > Environment: Installation done with pip: > pip install pyarrow pandas > for python3.8 on a windows machine running windows 10 Enterprise (v1809). The > resulting wheel is: > pyarrow-1.0.0-cp38-cp38-win_amd64.whl >Reporter: Dylan Modesitt >Priority: Major > Labels: parquet > > {code:java} > import pandas > import numpy > df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), > columns=list("1234")) > df.to_parquet("the.parquet") > pd.read_parquet("the.parquet") # fails here > {code} > fails with > {code:java} > Process finished with exit code -1073741795 (0xC01D) > {code} > {code:java} > pyarrow.parquet.read_pandas(pyarrow.BufferReader(...)).to_pandas() > {code} > also fails with the same exit message. Has this been seen before? Is there a > known solution? I experienced the same issue installing the pyarrow nightlies > as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9745) [Python] Reading Parquet file crashes on windows - python3.8
[ https://issues.apache.org/jira/browse/ARROW-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270118#comment-17270118 ] Maximilian Speicher commented on ARROW-9745: For me the same error persists even after doing a clean reinstall of Python and recreating the venv. It somehow seems to be related to snappy compression, as it works fine when using gzip as the compression. > [Python] Reading Parquet file crashes on windows - python3.8 > > > Key: ARROW-9745 > URL: https://issues.apache.org/jira/browse/ARROW-9745 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 1.0.0 > Environment: Installation done with pip: > pip install pyarrow pandas > for python3.8 on a windows machine running windows 10 Enterprise (v1809). The > resulting wheel is: > pyarrow-1.0.0-cp38-cp38-win_amd64.whl >Reporter: Dylan Modesitt >Priority: Major > Labels: parquet > > {code:java} > import pandas > import numpy > df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), > columns=list("1234")) > df.to_parquet("the.parquet") > pd.read_parquet("the.parquet") # fails here > {code} > fails with > {code:java} > Process finished with exit code -1073741795 (0xC01D) > {code} > {code:java} > pyarrow.parquet.read_pandas(pyarrow.BufferReader(...)).to_pandas() > {code} > also fails with the same exit message. Has this been seen before? Is there a > known solution? I experienced the same issue installing the pyarrow nightlies > as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-10766) [Rust] Compute nested definition and repetition for list arrays
[ https://issues.apache.org/jira/browse/ARROW-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-10766. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9240 [https://github.com/apache/arrow/pull/9240] > [Rust] Compute nested definition and repetition for list arrays > --- > > Key: ARROW-10766 > URL: https://issues.apache.org/jira/browse/ARROW-10766 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This extends on ARROW-9728 by only focusing on list array repetition and > definition levels -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11343) [DataFusion] Simplified example
[ https://issues.apache.org/jira/browse/ARROW-11343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11343. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9290 [https://github.com/apache/arrow/pull/9290] > [DataFusion] Simplified example > --- > > Key: ARROW-11343 > URL: https://issues.apache.org/jira/browse/ARROW-11343 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11346) [C++][Compute] Implement quantile kernel benchmark
[ https://issues.apache.org/jira/browse/ARROW-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11346: --- Labels: pull-request-available (was: ) > [C++][Compute] Implement quantile kernel benchmark > -- > > Key: ARROW-11346 > URL: https://issues.apache.org/jira/browse/ARROW-11346 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11347) Consider Objects instead of Maps
Dominik Moritz created ARROW-11347: -- Summary: Consider Objects instead of Maps Key: ARROW-11347 URL: https://issues.apache.org/jira/browse/ARROW-11347 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Dominik Moritz A quick experiment (https://observablehq.com/@domoritz/performance-of-maps-vs-objects) seems to show that object accesses are a lot faster than map accesses. Would it make sense to switch to objects in the row API to improve performance? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11346) [C++][Compute] Implement quantile kernel benchmark
Yibo Cai created ARROW-11346: Summary: [C++][Compute] Implement quantile kernel benchmark Key: ARROW-11346 URL: https://issues.apache.org/jira/browse/ARROW-11346 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yibo Cai Assignee: Yibo Cai -- This message was sent by Atlassian Jira (v8.3.4#803005)