[jira] [Commented] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build
[ https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938260#comment-16938260 ] Suvayu Ali commented on ARROW-4930: --- Hi [~kou] I'm a bit out of my depth here, but here's my attempt: https://github.com/apache/arrow/pull/5504 > [Python] Remove LIBDIR assumptions in Python build > -- > > Key: ARROW-4930 > URL: https://issues.apache.org/jira/browse/ARROW-4930 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.12.1 >Reporter: Suvayu Ali >Priority: Minor > Labels: pull-request-available, setup.py > Fix For: 2.0.0 > > Attachments: FindArrow.cmake.patch, FindParquet.cmake.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This is in reference to (4) in > [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E] > mailing list discussion. > Certain sections of setup.py assume a specific location of the C++ libraries. > Removing this hard assumption will simplify PyArrow builds significantly. As > far as I could tell these assumptions are made in the > {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are > handled). > # The first occurrence is before invoking cmake (see line 237). > # The second occurrence is when the C++ libraries are moved from their build > directory to the Python tree (see line 347). The actual implementation is in > the function {{_move_shared_libs_unix(..)}} (see line 468). > Hope this helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build
[ https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936383#comment-16936383 ] Suvayu Ali commented on ARROW-4930: --- Hi [~apitrou], I have had limited success so far. [I was working off of master, {{git describe}} says: {{apache-arrow-0.14.0-584-g176adf5a0}}] This is what I found: 1. {{setup.py}} makes the library directory is {{$ARROW_HOME/lib}} when setting {{PKG_CONFIG_PATH}} in the environment (line 253). I believe this is bit of a hack, which is also mentioned by the author in the issue that tracked that change ARROW-1090. The resolution should be somewhere in the cmake scripts. 2. I successfully detected {{libarrow}} with the attached patch [^FindArrow.cmake.patch]. 3. However I then failed to detect {{libparquet}}. On further investigation I found (AFAIU) that even though {{FindParquet.cmake}} sets {{ARROW_HOME}}, it is not used. However, it does use {{PARQUET_HOME}}. Since my CMake foo is a bit weak, I worked up a similar patch [^FindParquet.cmake.patch] as before and set {{export PARQUET_HOME=$ARROW_HOME}} in the terminal. This allowed the compilation to succeed. The compilation commands I used for C++ and Python are: {code:java} $ cmake -G Ninja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_ORC=ON \ -DARROW_PARQUET=ON -DPYTHON_EXECUTABLE=/usr/bin/python3.7m \ -DARROW_PYTHON=ON -DARROW_PLASMA=ON \ -DARROW_BUILD_TESTS=ON -DLLVM_DIR=/usr/lib64/llvm7.0 .. $ python3 setup.py build_ext --cmake-generator Ninja --inplace {code} I then tried to run the python tests with {{pytest-3 pyarrow}}. The summary was: {quote}5 failed, 1411 passed, 59 skipped, 4 xfailed, 29 warnings in 28.30 seconds {quote} The failures are all some kind of setup related issues, not being able to import, not being able to start plasma, etc. I'll investigate this further, but my take is the cmake scripts don't actually have _one way_ of detecting the libraries, making it very difficult to configure it properly from setup.py. > [Python] Remove LIBDIR assumptions in Python build > -- > > Key: ARROW-4930 > URL: https://issues.apache.org/jira/browse/ARROW-4930 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.12.1 >Reporter: Suvayu Ali >Priority: Minor > Labels: setup.py > Fix For: 2.0.0 > > Attachments: FindArrow.cmake.patch, FindParquet.cmake.patch > > > This is in reference to (4) in > [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E] > mailing list discussion. > Certain sections of setup.py assume a specific location of the C++ libraries. > Removing this hard assumption will simplify PyArrow builds significantly. As > far as I could tell these assumptions are made in the > {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are > handled). > # The first occurrence is before invoking cmake (see line 237). > # The second occurrence is when the C++ libraries are moved from their build > directory to the Python tree (see line 347). The actual implementation is in > the function {{_move_shared_libs_unix(..)}} (see line 468). > Hope this helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build
[ https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-4930: -- Attachment: FindParquet.cmake.patch FindArrow.cmake.patch > [Python] Remove LIBDIR assumptions in Python build > -- > > Key: ARROW-4930 > URL: https://issues.apache.org/jira/browse/ARROW-4930 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.12.1 >Reporter: Suvayu Ali >Priority: Minor > Labels: setup.py > Fix For: 2.0.0 > > Attachments: FindArrow.cmake.patch, FindParquet.cmake.patch > > > This is in reference to (4) in > [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E] > mailing list discussion. > Certain sections of setup.py assume a specific location of the C++ libraries. > Removing this hard assumption will simplify PyArrow builds significantly. As > far as I could tell these assumptions are made in the > {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are > handled). > # The first occurrence is before invoking cmake (see line 237). > # The second occurrence is when the C++ libraries are moved from their build > directory to the Python tree (see line 347). The actual implementation is in > the function {{_move_shared_libs_unix(..)}} (see line 468). > Hope this helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build
[ https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933039#comment-16933039 ] Suvayu Ali commented on ARROW-4930: --- I have some time this weekend, I'll have a go at it. > [Python] Remove LIBDIR assumptions in Python build > -- > > Key: ARROW-4930 > URL: https://issues.apache.org/jira/browse/ARROW-4930 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.12.1 >Reporter: Suvayu Ali >Priority: Minor > Labels: setup.py > Fix For: 2.0.0 > > > This is in reference to (4) in > [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E] > mailing list discussion. > Certain sections of setup.py assume a specific location of the C++ libraries. > Removing this hard assumption will simplify PyArrow builds significantly. As > far as I could tell these assumptions are made in the > {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are > handled). > # The first occurrence is before invoking cmake (see line 237). > # The second occurrence is when the C++ libraries are moved from their build > directory to the Python tree (see line 347). The actual implementation is in > the function {{_move_shared_libs_unix(..)}} (see line 468). > Hope this helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931697#comment-16931697 ] Suvayu Ali commented on ARROW-6577: --- For completeness, I managed to upgrade `conda` to 4.7.11, and now the problem does not occur any more. > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Assignee: Uwe L. Korn >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931383#comment-16931383 ] Suvayu Ali commented on ARROW-6577: --- [~Igor Yastrebov] Thanks a lot, I'll see if I can upgrade `conda`. My issues were also mostly with boost. > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Assignee: Uwe L. Korn >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931357#comment-16931357 ] Suvayu Ali commented on ARROW-6577: --- [~Igor Yastrebov] Yes [~xhochy] Hmm, it's not easy for me to upgrade conda itself. Thanks for investigating. I'll see what I can do. > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931333#comment-16931333 ] Suvayu Ali edited comment on ARROW-6577 at 9/17/19 11:26 AM: - Hi Uwe, I mentioned the conda version in the Environment field above (4.6.13), and my condarc looks like this: {code} channels: - conda-forge - defaults channel_priority: strict auto_activate_base: true pip_interop_enabled: true {code} I have also seen this on my colleagues Mac (don't know the environment details). was (Author: suvayu): Hi Uwe, I mentioned the conda version in the Environment field above (4.6.13), and my condarc looks like this: {code} channels: - conda-forge - defaults channel_priority: strict auto_activate_base: true pip_interop_enabled: true {code} I have also seen this on my colleagues Mac (don't know the environment details). > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931333#comment-16931333 ] Suvayu Ali commented on ARROW-6577: --- Hi Uwe, I mentioned the conda version in the Environment field above (4.6.13), and my condarc looks like this: {code} channels: - conda-forge - defaults channel_priority: strict auto_activate_base: true pip_interop_enabled: true {code} I have also seen this on my colleagues Mac (don't know the environment details). > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (ARROW-6577) Dependency conflict in conda packages
[ https://issues.apache.org/jira/browse/ARROW-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-6577: -- Description: When I install pyarrow on a fresh environment, the latest version (0.14.1) is picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 0.12.1. I think a common dependency is causing the downgrade, my guess is boost or protobuf. This is based on several instances of this issue I encountered over the last few weeks. It took me a while to find a somewhat reproducible recipe. {code:java} $ conda create -n test pyarrow pandas numpy ... Proceed ([y]/n)? y ... $ conda install -n test ipython ... Proceed ([y]/n)? n CondaSystemExit: Exiting. {code} I have attached a mildly edited (to remove progress bars, and control characters) transcript of this session. Here {{ipython}} triggers the problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other common packages who also conflict in this way. Please let me know if I can provide more info. was: When I install pyarrow on a fresh environment, the latest version (0.14.1) is picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 0.12.1. I think a common dependency is causing the downgrade, my guess is boost. This is based on several instances of this issue I encountered over the last few weeks. It took me a while to find a somewhat reproducible recipe. {code} $ conda create -n test pyarrow pandas numpy ... Proceed ([y]/n)? y ... $ conda install -n test ipython ... Proceed ([y]/n)? n CondaSystemExit: Exiting. {code} I have attached a mildly edited (to remove progress bars, and control characters) transcript of this session. Here {{ipython}} triggers the problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other common packages who also conflict in this way. Please let me know if I can provide more info. > Dependency conflict in conda packages > - > > Key: ARROW-6577 > URL: https://issues.apache.org/jira/browse/ARROW-6577 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.1 > Environment: kernel: 5.2.11-200.fc30.x86_64 > conda 4.6.13 > Python 3.7.3 >Reporter: Suvayu Ali >Priority: Major > Attachments: pa-conda.txt > > > When I install pyarrow on a fresh environment, the latest version (0.14.1) is > picked up. But installing certain packages downgrades pyarrow to 0.13.0 or > 0.12.1. I think a common dependency is causing the downgrade, my guess is > boost or protobuf. This is based on several instances of this issue I > encountered over the last few weeks. It took me a while to find a somewhat > reproducible recipe. > {code:java} > $ conda create -n test pyarrow pandas numpy > ... > Proceed ([y]/n)? y > ... > $ conda install -n test ipython > ... > Proceed ([y]/n)? n > CondaSystemExit: Exiting. > {code} > I have attached a mildly edited (to remove progress bars, and control > characters) transcript of this session. Here {{ipython}} triggers the > problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other > common packages who also conflict in this way. Please let me know if I can > provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6577) Dependency conflict in conda packages
Suvayu Ali created ARROW-6577: - Summary: Dependency conflict in conda packages Key: ARROW-6577 URL: https://issues.apache.org/jira/browse/ARROW-6577 Project: Apache Arrow Issue Type: Bug Components: Packaging Affects Versions: 0.14.1 Environment: kernel: 5.2.11-200.fc30.x86_64 conda 4.6.13 Python 3.7.3 Reporter: Suvayu Ali Attachments: pa-conda.txt When I install pyarrow on a fresh environment, the latest version (0.14.1) is picked up. But installing certain packages downgrades pyarrow to 0.13.0 or 0.12.1. I think a common dependency is causing the downgrade, my guess is boost. This is based on several instances of this issue I encountered over the last few weeks. It took me a while to find a somewhat reproducible recipe. {code} $ conda create -n test pyarrow pandas numpy ... Proceed ([y]/n)? y ... $ conda install -n test ipython ... Proceed ([y]/n)? n CondaSystemExit: Exiting. {code} I have attached a mildly edited (to remove progress bars, and control characters) transcript of this session. Here {{ipython}} triggers the problem, and downgrades {{pyarrow}} to 0.12.1, but I think there are other common packages who also conflict in this way. Please let me know if I can provide more info. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-5871) [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt
[ https://issues.apache.org/jira/browse/ARROW-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884841#comment-16884841 ] Suvayu Ali commented on ARROW-5871: --- Hi [~wesmckinn], I was able to build arrow-cpp and pyarrow from source from the maint-0.14.x branch. Although I have not done any testing like, installing the wheel on different platforms, the above crash does not happen when I do a simple import. > [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt > > > Key: ARROW-5871 > URL: https://issues.apache.org/jira/browse/ARROW-5871 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.0 > Environment: 5.1.16-300.fc30.x86_64 > Python 3.7.3 > libxcrypt-4.4.6-2.fc30.x86_64 >Reporter: Suvayu Ali >Priority: Major > Fix For: 1.0.0 > > > In a freshly created virtual environment, after I install pyarrow 0.14.0 > (using pip), importing pyarrow from the python prompt leads to crash: > {code:java} > $ mktmpenv > [..] > This is a temporary environment. It will be deleted when you run 'deactivate'. > $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl > Collecting numpy>=1.14 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl > Collecting six>=1.0.0 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl > Installing collected packages: numpy, six, pyarrow > Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 > $ python --version > Python 3.7.3 > $ python -m pyarrow > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details > return _get_module_details(pkg_main_name, error) > File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details > __import__(pkg_name) > File > "/home/user/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", > line 49, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: libcrypt.so.1: cannot open shared object file: No such file or > directory{code} > This is surprising because I have older versions of pyarrow (up to 0.13.0) > working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-5871) [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt
[ https://issues.apache.org/jira/browse/ARROW-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881214#comment-16881214 ] Suvayu Ali commented on ARROW-5871: --- I think that's the instruction I followed the last time I tried (around March), it even led to a patch or two. I'll give it another go this weekend. > [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt > > > Key: ARROW-5871 > URL: https://issues.apache.org/jira/browse/ARROW-5871 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.0 > Environment: 5.1.16-300.fc30.x86_64 > Python 3.7.3 > libxcrypt-4.4.6-2.fc30.x86_64 >Reporter: Suvayu Ali >Priority: Major > Fix For: 1.0.0 > > > In a freshly created virtual environment, after I install pyarrow 0.14.0 > (using pip), importing pyarrow from the python prompt leads to crash: > {code:java} > $ mktmpenv > [..] > This is a temporary environment. It will be deleted when you run 'deactivate'. > $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl > Collecting numpy>=1.14 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl > Collecting six>=1.0.0 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl > Installing collected packages: numpy, six, pyarrow > Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 > $ python --version > Python 3.7.3 > $ python -m pyarrow > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details > return _get_module_details(pkg_main_name, error) > File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details > __import__(pkg_name) > File > "/home/user/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", > line 49, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: libcrypt.so.1: cannot open shared object file: No such file or > directory{code} > This is surprising because I have older versions of pyarrow (up to 0.13.0) > working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5871) [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt
[ https://issues.apache.org/jira/browse/ARROW-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880948#comment-16880948 ] Suvayu Ali commented on ARROW-5871: --- Hi [~wesmckinn], I read that issue. Unfortunately my experience with conda has been rather frustrating. I think for production use I'll stick to 0.13.0 for now, and try to compile from source for experimental use. Unfortunately I have never successfully managed to compile pyarrow before (no issues with the C++ library though). Thanks a lot > [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt > > > Key: ARROW-5871 > URL: https://issues.apache.org/jira/browse/ARROW-5871 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.0 > Environment: 5.1.16-300.fc30.x86_64 > Python 3.7.3 > libxcrypt-4.4.6-2.fc30.x86_64 >Reporter: Suvayu Ali >Priority: Major > Fix For: 1.0.0 > > > In a freshly created virtual environment, after I install pyarrow 0.14.0 > (using pip), importing pyarrow from the python prompt leads to crash: > {code:java} > $ mktmpenv > [..] > This is a temporary environment. It will be deleted when you run 'deactivate'. > $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl > Collecting numpy>=1.14 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl > Collecting six>=1.0.0 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl > Installing collected packages: numpy, six, pyarrow > Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 > $ python --version > Python 3.7.3 > $ python -m pyarrow > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details > return _get_module_details(pkg_main_name, error) > File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details > __import__(pkg_name) > File > "/home/user/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", > line 49, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: libcrypt.so.1: cannot open shared object file: No such file or > directory{code} > This is surprising because I have older versions of pyarrow (up to 0.13.0) > working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5871) [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt
[ https://issues.apache.org/jira/browse/ARROW-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880548#comment-16880548 ] Suvayu Ali commented on ARROW-5871: --- Hi [~wesmckinn], I see the same issue with the manylinux1 wheel. > [Python] Can't import pyarrow 0.14.0 due to mismatching libcrypt > > > Key: ARROW-5871 > URL: https://issues.apache.org/jira/browse/ARROW-5871 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.0 > Environment: 5.1.16-300.fc30.x86_64 > Python 3.7.3 > libxcrypt-4.4.6-2.fc30.x86_64 >Reporter: Suvayu Ali >Priority: Major > > In a freshly created virtual environment, after I install pyarrow 0.14.0 > (using pip), importing pyarrow from the python prompt leads to crash: > {code:java} > $ mktmpenv > [..] > This is a temporary environment. It will be deleted when you run 'deactivate'. > $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl > Collecting numpy>=1.14 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl > Collecting six>=1.0.0 (from pyarrow) > Using cached > https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl > Installing collected packages: numpy, six, pyarrow > Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 > $ python --version > Python 3.7.3 > $ python -m pyarrow > Traceback (most recent call last): > File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details > return _get_module_details(pkg_main_name, error) > File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details > __import__(pkg_name) > File > "/home/user/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", > line 49, in > from pyarrow.lib import cpu_count, set_cpu_count > ImportError: libcrypt.so.1: cannot open shared object file: No such file or > directory{code} > This is surprising because I have older versions of pyarrow (up to 0.13.0) > working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5871) Can't import pyarrow 0.14.0 due to mismatching libcrypt
[ https://issues.apache.org/jira/browse/ARROW-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-5871: -- Description: In a freshly created virtual environment, after I install pyarrow 0.14.0 (using pip), importing pyarrow from the python prompt leads to crash: {code:java} $ mktmpenv [..] This is a temporary environment. It will be deleted when you run 'deactivate'. $ pip install pyarrow Collecting pyarrow Using cached https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl Collecting numpy>=1.14 (from pyarrow) Using cached https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl Collecting six>=1.0.0 (from pyarrow) Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl Installing collected packages: numpy, six, pyarrow Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 $ python --version Python 3.7.3 $ python -m pyarrow Traceback (most recent call last): File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details return _get_module_details(pkg_main_name, error) File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/home/user/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", line 49, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory{code} This is surprising because I have older versions of pyarrow (up to 0.13.0) working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! was: In a freshly created virtual environment, after I install pyarrow 0.14.0 (using pip), importing pyarrow from the python prompt leads to crash: {code:java} $ mktmpenv [..] This is a temporary environment. It will be deleted when you run 'deactivate'. $ pip install pyarrow Collecting pyarrow Using cached https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl Collecting numpy>=1.14 (from pyarrow) Using cached https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl Collecting six>=1.0.0 (from pyarrow) Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl Installing collected packages: numpy, six, pyarrow Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 $ python --version Python 3.7.3 $ python -m pyarrow Traceback (most recent call last): File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details return _get_module_details(pkg_main_name, error) File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/home/jallad/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", line 49, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory{code} This is surprising because I have older versions of pyarrow (up to 0.13.0) working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! > Can't import pyarrow 0.14.0 due to mismatching libcrypt > --- > > Key: ARROW-5871 > URL: https://issues.apache.org/jira/browse/ARROW-5871 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.14.0 > Environment: 5.1.16-300.fc30.x86_64 > Python 3.7.3 > libxcrypt-4.4.6-2.fc30.x86_64 >Reporter: Suvayu Ali >Priority: Major > > In a freshly created virtual environment, after I install pyarrow 0.14.0 > (using pip), importing pyarrow from the python prompt leads to crash: > {code:java} > $ mktmpenv > [..] > This is a temporary environment. It will be deleted when you run 'deactivate'. > $ pip install pyarrow > Collecting pyarrow > Using cached > https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl > Collecting numpy>=1.14 (from pyarrow) > Using cached >
[jira] [Created] (ARROW-5871) Can't import pyarrow 0.14.0 due to mismatching libcrypt
Suvayu Ali created ARROW-5871: - Summary: Can't import pyarrow 0.14.0 due to mismatching libcrypt Key: ARROW-5871 URL: https://issues.apache.org/jira/browse/ARROW-5871 Project: Apache Arrow Issue Type: Bug Components: Packaging Affects Versions: 0.14.0 Environment: 5.1.16-300.fc30.x86_64 Python 3.7.3 libxcrypt-4.4.6-2.fc30.x86_64 Reporter: Suvayu Ali In a freshly created virtual environment, after I install pyarrow 0.14.0 (using pip), importing pyarrow from the python prompt leads to crash: {code:java} $ mktmpenv [..] This is a temporary environment. It will be deleted when you run 'deactivate'. $ pip install pyarrow Collecting pyarrow Using cached https://files.pythonhosted.org/packages/8f/fa/407667d763c25c3d9977e1d19038df3b4a693f37789c4fe1fe5c74a6bc55/pyarrow-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl Collecting numpy>=1.14 (from pyarrow) Using cached https://files.pythonhosted.org/packages/fc/d1/45be1144b03b6b1e24f9a924f23f66b4ad030d834ad31fb9e5581bd328af/numpy-1.16.4-cp37-cp37m-manylinux1_x86_64.whl Collecting six>=1.0.0 (from pyarrow) Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl Installing collected packages: numpy, six, pyarrow Successfully installed numpy-1.16.4 pyarrow-0.14.0 six-1.12.0 $ python --version Python 3.7.3 $ python -m pyarrow Traceback (most recent call last): File "/usr/lib64/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib64/python3.7/runpy.py", line 142, in _get_module_details return _get_module_details(pkg_main_name, error) File "/usr/lib64/python3.7/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/home/jallad/.virtualenvs/tmp-8a4d52e7bb62853/lib/python3.7/site-packages/pyarrow/__init__.py", line 49, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory{code} This is surprising because I have older versions of pyarrow (up to 0.13.0) working, and libcrypt on my system (Fedora 30, Python 3.7) is libcrypt.so.2! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4930) Remove LIBDIR assumptions in Python build
Suvayu Ali created ARROW-4930: - Summary: Remove LIBDIR assumptions in Python build Key: ARROW-4930 URL: https://issues.apache.org/jira/browse/ARROW-4930 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.12.1 Reporter: Suvayu Ali This is in reference to (4) in [this|http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C0AF328A1-ED2A-457F-B72D-3B49C8614850%40xhochy.com%3E] mailing list discussion. Certain sections of setup.py assume a specific location of the C++ libraries. Removing this hard assumption will simplify PyArrow builds significantly. As far as I could tell these assumptions are made in the {{build_ext._run_cmake()}} method (wherever bundling of C++ libraries are handled). # The first occurrence is before invoking cmake (see line 237). # The second occurrence is when the C++ libraries are moved from their build directory to the Python tree (see line 347). The actual implementation is in the function {{_move_shared_libs_unix(..)}} (see line 468). Hope this helps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet
Suvayu Ali created ARROW-4814: - Summary: [Python] Exception when writing nested columns that are tuples to parquet Key: ARROW-4814 URL: https://issues.apache.org/jira/browse/ARROW-4814 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1 Environment: 4.20.8-100.fc28.x86_64 Reporter: Suvayu Ali Attachments: df_to_parquet_fail.py, test.csv I get an exception when I try to write a {{pandas.DataFrame}} to a parquet file where one of the columns has tuples in them. I use tuples here because it allows for easier querying in pandas (see ARROW-3806 for a more detailed description). {code} Traceback (most recent call last): File "df_to_parquet_fail.py", line 5, in df.to_parquet("test.parquet") # crashes File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2203, in to_parquet partition_cols=partition_cols, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 252, in to_parquet partition_cols=partition_cols, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 113, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 431, in dataframe_to_arrays convert_types)] File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 430, in for c, t in zip(columns_to_convert, File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 426, in convert_column raise e File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 420, in convert_column return pa.array(col, type=ty, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 176, in pyarrow.lib.array File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column ALTS with type object') {code} The issue maybe replicated with the attached script and csv file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708321#comment-16708321 ] Suvayu Ali commented on ARROW-3874: --- Since I'm using {{java-1.8.0-openjdk}}, I had to install {{java-1.8.0-openjdk-devel}} to get {{jni.h}}. For other java versions on F29, it should be {{java--openjdk-devel}}. > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 29, master (1013a1dc) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) > cmake version 3.12.1 >Reporter: Suvayu Ali >Assignee: Suvayu Ali >Priority: Major > Labels: cmake, pull-request-available > Fix For: 0.12.0 > > Attachments: CMakeError.log, CMakeOutput.log, > arrow-cmake-findllvm.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707086#comment-16707086 ] Suvayu Ali commented on ARROW-3874: --- Done: [https://github.com/apache/arrow/pull/3072] Your question about {{jni.h}} gave me enough hints to find the correct missing package :), and now the build progresses until it fails with: {code} Scanning dependencies of target csv-chunker-test CMakeFiles/json-integration-test.dir/json-integration-test.cc.o:json-integration-test.cc:function boost::system::error_category::std_category::equivalent(std::error_code const&, int) const: error: undefined reference to 'boost::system::detail::generic_category_ncx()' {code} This is strange because I have {{boost-system-1.66.0-14.fc29.x86_64}} installed on my system. But I guess that's a test, and the libraries were built successfully. > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 29, master (1013a1dc) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) > cmake version 3.12.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log, > arrow-cmake-findllvm.patch > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702776#comment-16702776 ] Suvayu Ali edited comment on ARROW-3874 at 11/29/18 6:42 AM: - Okay, to summarise: my initial build issue on F28 was resolved by installing the llvm-static libraries. On F29, cmake cannot find the correct version of LLVM. {code} $ export ARROW_HOME=~/opt $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0". The following configuration files were considered but not accepted: /usr/lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 /lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 Call Stack (most recent call first): src/gandiva/CMakeLists.txt:25 (find_package) {code} Fedora provides alternate llvm versions installed in subdirectories, so I tried specifying {{LLVM_DIR}} when invoking cmake. {code} $ ls /usr/lib64/llvm6.0/ bin include lib $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON \ -DLLVM_DIR=/usr/lib64/llvm6.0 ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0". The following configuration files were considered but not accepted: /usr/lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 /lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 Call Stack (most recent call first): src/gandiva/CMakeLists.txt:25 (find_package) {code} So I patched {{find_library}} (see [^arrow-cmake-findllvm.patch]), that fixes the LLVM issue, but then I encounter the following Java issue {code} $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON \ -DLLVM_DIR=/usr/lib64/llvm6.0 ../ [...] -- Found LLVM 6.0.1 -- Using LLVMConfig.cmake in: /usr/lib64/llvm6.0/lib/cmake/llvm -- Found clang /usr/lib64/ccache/clang -- Found llvm-link /usr/lib64/llvm6.0/bin/llvm-link CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message): Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindJNI.cmake:356 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) src/gandiva/jni/CMakeLists.txt:21 (find_package) {code} My Java setup {code} $ echo $JAVA_HOME /etc/alternatives/jre_openjdk $ $JAVA_HOME/bin/java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) $ rpm -qa \*jni\* | sort hawtjni-1.16-3.fc29.noarch hawtjni-runtime-1.16-3.fc29.noarch $ rpm -qa \*java\* | sort java-11-openjdk-headless-11.0.1.13-4.fc29.x86_64 java-1.8.0-openjdk-headless-1.8.0.191.b12-8.fc29.x86_64 java-openjdk-headless-10.0.2.13-7.fc29.x86_64 javapackages-filesystem-5.3.0-1.fc29.noarch javapackages-tools-5.3.0-1.fc29.noarch tzdata-java-2018g-1.fc29.noarch {code} Unfortunately, I cannot easily compare F28 and F29 as I never have access to them simultaneously. was (Author: suvayu): Okay, to summarise: my initial build issue on F28 was resolved by installing the llvm-static libraries. On F29, cmake cannot find the correct version of LLVM. {code} $ export ARROW_HOME=~/opt $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0". The following configuration files were considered but not accepted: /usr/lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 /lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 Call Stack (most recent call first): src/gandiva/CMakeLists.txt:25 (find_package) {code} Fedora provides alternate llvm versions installed in subdirectories, so I tried specifying {{LLVM_DIR}} when invoking cmake. {code} $ ls /usr/lib64/llvm6.0/ bin include lib $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON \ -DLLVM_DIR=/usr/lib64/llvm6.0 ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0".
[jira] [Commented] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702776#comment-16702776 ] Suvayu Ali commented on ARROW-3874: --- Okay, to summarise: my initial build issue on F28 was resolved by installing the llvm-static libraries. On F29, cmake cannot find the correct version of LLVM. {code} $ export ARROW_HOME=~/opt $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0". The following configuration files were considered but not accepted: /usr/lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 /lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 Call Stack (most recent call first): src/gandiva/CMakeLists.txt:25 (find_package) {code} Fedora provides alternate llvm versions installed in subdirectories, so I tried specifying {{LLVM_DIR}} when invoking cmake. {code} $ ls /usr/lib64/llvm6.0/ bin include lib $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON \ -DLLVM_DIR=/usr/lib64/llvm6.0 ../ [...] CMake Error at cmake_modules/FindLLVM.cmake:24 (find_package): Could not find a configuration file for package "LLVM" that is compatible with requested version "6.0". The following configuration files were considered but not accepted: /usr/lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 /lib64/cmake/llvm/LLVMConfig.cmake, version: 7.0.0 Call Stack (most recent call first): src/gandiva/CMakeLists.txt:25 (find_package) {code} So I patched {{find_library}} {{(see }}[^arrow-cmake-findllvm.patch]), that fixes the LLVM issue, but then I encounter the following Java issue {code} $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DARROW_PARQUET=on -DARROW_ORC=ON -DARROW_PLASMA=on -DARROW_GANDIVA=ON \ -DLLVM_DIR=/usr/lib64/llvm6.0 ../ [...] -- Found LLVM 6.0.1 -- Using LLVMConfig.cmake in: /usr/lib64/llvm6.0/lib/cmake/llvm -- Found clang /usr/lib64/ccache/clang -- Found llvm-link /usr/lib64/llvm6.0/bin/llvm-link CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 (message): Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) Call Stack (most recent call first): /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake/Modules/FindJNI.cmake:356 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) src/gandiva/jni/CMakeLists.txt:21 (find_package) {code} My Java setup {code} $ echo $JAVA_HOME /etc/alternatives/jre_openjdk $ $JAVA_HOME/bin/java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) $ rpm -qa \*jni\* | sort hawtjni-1.16-3.fc29.noarch hawtjni-runtime-1.16-3.fc29.noarch $ rpm -qa \*java\* | sort java-11-openjdk-headless-11.0.1.13-4.fc29.x86_64 java-1.8.0-openjdk-headless-1.8.0.191.b12-8.fc29.x86_64 java-openjdk-headless-10.0.2.13-7.fc29.x86_64 javapackages-filesystem-5.3.0-1.fc29.noarch javapackages-tools-5.3.0-1.fc29.noarch tzdata-java-2018g-1.fc29.noarch {code} Unfortunately, I cannot easily compare F28 and F29 as I never have access to them simultaneously. > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 29, master (1013a1dc) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) > cmake version 3.12.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log, > arrow-cmake-findllvm.patch > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring
[jira] [Updated] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-3874: -- Attachment: arrow-cmake-findllvm.patch > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 29, master (1013a1dc) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) > cmake version 3.12.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log, > arrow-cmake-findllvm.patch > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-3874: -- Environment: Fedora 29, master (1013a1dc) gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) cmake version 3.12.1 was: Fedora 29, master (1013a1dc) gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) llvm (7.0.0 and 6.0.1) > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 29, master (1013a1dc) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 7.0.0 (default) and 6.0.1 (parallel installed package from Fedora repos) > cmake version 3.12.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699406#comment-16699406 ] Suvayu Ali commented on ARROW-3874: --- Thanks for the link. In the meantime, I tried to build with *Gandiva* on Fedora 29, and it failed to detect LLVM (my original attempt was on F28, which was resolved by installing the static libraries). On F29 the default version is 7, while other versions like 6.0 are installed in subdirectories (e.g. {{/usr/lib64/llvm6.0}}). Setting {{-DLLVM_DIR=/path}} doesn't help, I had to add {{LLVM_DIR}} to {{find_package}} in {{FindLLVM.cmake}}. While the edit resolved the LLVM issue, cmake failed again unable to find {{JAVA_AWT_JNI}} (don't remember exactly, not on F29 now). I couldn't figure out if it was something missing, or if cmake was unable to detect again. I'm unsure how to report this, do I update this bug report and change the platform from F28 to F29, or do I close this and open a fresh one? > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 28, master (8d5bfc65) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 6.0.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698439#comment-16698439 ] Suvayu Ali edited comment on ARROW-3874 at 11/26/18 3:13 AM: - I had installed {{llvm-devel}} using dnf. cmake worked fine after installing {{llvm-static}}. Thanks! But during the build I also noticed, many already installed libraries are being downloaded: {code:java} [ 2%] Performing download step (download, verify and extract) for 'protobuf_ep' [ 2%] Performing download step (download, verify and extract) for 'thrift_ep' {code} I have these installed: {code:java} $ rpm -qa thrift\* protobuf\* protobuf-3.5.0-4.fc28.x86_64 protobuf-compiler-3.5.0-4.fc28.x86_64 protobuf-java-3.5.0-4.fc28.noarch protobuf-c-1.3.0-4.fc28.x86_64 protobuf-devel-3.5.0-4.fc28.x86_64 protobuf-lite-3.5.0-4.fc28.x86_64 thrift-devel-0.10.0-9.fc28.x86_64 thrift-0.10.0-9.fc28.x86_64 {code} Am I missing some libraries there as well? was (Author: suvayu): I had installed {{llvm-devel}} using dnf. cmake worked fine after installing {{llvm-static}}. Thanks! But during the build I also noticed, many already installed libraries are being downloaded: {code:java} [ 2%] Performing download step (download, verify and extract) for 'protobuf_ep' [ 2%] Performing download step (download, verify and extract) for 'thrift_ep' {code} I have these installed: {code:java} $ rpm -qa thrift\* protobuf\* protobuf-3.5.0-4.fc28.x86_64 protobuf-compiler-3.5.0-4.fc28.x86_64 protobuf-java-3.5.0-4.fc28.noarch protobuf-c-1.3.0-4.fc28.x86_64 protobuf-devel-3.5.0-4.fc28.x86_64 protobuf-lite-3.5.0-4.fc28.x86_64 thrift-devel-0.10.0-9.fc28.x86_64 thrift-0.10.0-9.fc28.x86_64 {code} Am I missing some libraries there as well? > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 28, master (8d5bfc65) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 6.0.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698439#comment-16698439 ] Suvayu Ali commented on ARROW-3874: --- I had installed {{llvm-devel}} using dnf. cmake worked fine after installing {{llvm-static}}. Thanks! But during the build I also noticed, many already installed libraries are being downloaded: {code:java} [ 2%] Performing download step (download, verify and extract) for 'protobuf_ep' [ 2%] Performing download step (download, verify and extract) for 'thrift_ep' {code} I have these installed: {code:java} $ rpm -qa thrift\* protobuf\* protobuf-3.5.0-4.fc28.x86_64 protobuf-compiler-3.5.0-4.fc28.x86_64 protobuf-java-3.5.0-4.fc28.noarch protobuf-c-1.3.0-4.fc28.x86_64 protobuf-devel-3.5.0-4.fc28.x86_64 protobuf-lite-3.5.0-4.fc28.x86_64 thrift-devel-0.10.0-9.fc28.x86_64 thrift-0.10.0-9.fc28.x86_64 {code} Am I missing some libraries there as well? > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 28, master (8d5bfc65) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 6.0.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected
Suvayu Ali created ARROW-3874: - Summary: [Gandiva] Cannot build: LLVM not detected Key: ARROW-3874 URL: https://issues.apache.org/jira/browse/ARROW-3874 Project: Apache Arrow Issue Type: Bug Components: Gandiva Affects Versions: 0.12.0 Environment: Fedora 28, master (8d5bfc65) gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) llvm 6.0.1 Reporter: Suvayu Ali Attachments: CMakeError.log, CMakeOutput.log I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while detecting LLVM on the system. {code} $ cd build/data-an/arrow/arrow/cpp/ $ export ARROW_HOME=/opt/data-an $ mkdir release $ cd release/ $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DARROW_GANDIVA=ON ../ [...] -- Found LLVM 6.0.1 -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): Target X86 is not in the set of libraries. Call Stack (most recent call first): cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) src/gandiva/CMakeLists.txt:25 (find_package) -- Configuring incomplete, errors occurred! {code} The cmake log files are attached. When I invoke cmake with options other than *Gandiva*, it finishes successfully. Here are the llvm libraries that are installed on my system: {code} $ rpm -qa llvm\* | sort llvm3.9-libs-3.9.1-13.fc28.x86_64 llvm4.0-libs-4.0.1-5.fc28.x86_64 llvm-6.0.1-8.fc28.x86_64 llvm-devel-6.0.1-8.fc28.x86_64 llvm-libs-6.0.1-8.fc28.i686 llvm-libs-6.0.1-8.fc28.x86_64 $ ls /usr/lib64/libLLVM* /usr/include/llvm /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so /usr/include/llvm: ADT FuzzMutate Object Support Analysis InitializePasses.h ObjectYAML TableGen AsmParserIR Option Target BinaryFormat IRReaderPassAnalysisSupport.h Testing Bitcode LineEditor Passes ToolDrivers CodeGen LinkAllIR.h Pass.h Transforms Config LinkAllPasses.h PassInfo.h WindowsManifest DebugInfoLinker PassRegistry.h WindowsResource Demangle LTO PassSupport.h XRay ExecutionEngine MC ProfileData {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected correctly
[ https://issues.apache.org/jira/browse/ARROW-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-3874: -- Summary: [Gandiva] Cannot build: LLVM not detected correctly (was: [Gandiva] Cannot build: LLVM not detected) > [Gandiva] Cannot build: LLVM not detected correctly > --- > > Key: ARROW-3874 > URL: https://issues.apache.org/jira/browse/ARROW-3874 > Project: Apache Arrow > Issue Type: Bug > Components: Gandiva >Affects Versions: 0.12.0 > Environment: Fedora 28, master (8d5bfc65) > gcc (GCC) 8.2.1 20181105 (Red Hat 8.2.1-5) > llvm 6.0.1 >Reporter: Suvayu Ali >Priority: Major > Labels: cmake > Attachments: CMakeError.log, CMakeOutput.log > > > I cannot build Arrow with {{-DARROW_GANDIVA=ON}}. {{cmake}} fails while > detecting LLVM on the system. > {code} > $ cd build/data-an/arrow/arrow/cpp/ > $ export ARROW_HOME=/opt/data-an > $ mkdir release > $ cd release/ > $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > -DARROW_GANDIVA=ON ../ > [...] > -- Found LLVM 6.0.1 > -- Using LLVMConfig.cmake in: /usr/lib64/cmake/llvm > CMake Error at /usr/lib64/cmake/llvm/LLVM-Config.cmake:175 (message): > Target X86 is not in the set of libraries. > Call Stack (most recent call first): > cmake_modules/FindLLVM.cmake:31 (llvm_map_components_to_libnames) > src/gandiva/CMakeLists.txt:25 (find_package) > -- Configuring incomplete, errors occurred! > {code} > The cmake log files are attached. > When I invoke cmake with options other than *Gandiva*, it finishes > successfully. > Here are the llvm libraries that are installed on my system: > {code} > $ rpm -qa llvm\* | sort > llvm3.9-libs-3.9.1-13.fc28.x86_64 > llvm4.0-libs-4.0.1-5.fc28.x86_64 > llvm-6.0.1-8.fc28.x86_64 > llvm-devel-6.0.1-8.fc28.x86_64 > llvm-libs-6.0.1-8.fc28.i686 > llvm-libs-6.0.1-8.fc28.x86_64 > $ ls /usr/lib64/libLLVM* /usr/include/llvm > /usr/lib64/libLLVM-6.0.1.so /usr/lib64/libLLVM-6.0.so /usr/lib64/libLLVM.so > /usr/include/llvm: > ADT FuzzMutate Object Support > Analysis InitializePasses.h ObjectYAML TableGen > AsmParserIR Option Target > BinaryFormat IRReaderPassAnalysisSupport.h Testing > Bitcode LineEditor Passes ToolDrivers > CodeGen LinkAllIR.h Pass.h Transforms > Config LinkAllPasses.h PassInfo.h WindowsManifest > DebugInfoLinker PassRegistry.h WindowsResource > Demangle LTO PassSupport.h XRay > ExecutionEngine MC ProfileData > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3806) [Python] When converting nested types to pandas, use tuples
Suvayu Ali created ARROW-3806: - Summary: [Python] When converting nested types to pandas, use tuples Key: ARROW-3806 URL: https://issues.apache.org/jira/browse/ARROW-3806 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.11.1 Environment: Fedora 29, pyarrow installed with conda Reporter: Suvayu Ali When converting to pandas, convert nested types (e.g. list) to tuples. Columns with lists are difficult to query. Here are a few unsuccessful attempts: {code} >>> mini CHROMPOS IDREFALTS QUAL 80 20 63521 rs191905748 G [A] 100 81 20 63541 rs117322527 C [A] 100 82 20 63548 rs541129280 G[GT] 100 83 20 63553 rs536661806 T [C] 100 84 20 63555 rs553463231 T [C] 100 85 20 63559 rs138359120 C [A] 100 86 20 63586 rs545178789 T [G] 100 87 20 63636 rs374311122 G [A] 100 88 20 63696 rs149160003 A [G] 100 89 20 63698 rs544072005 A [C] 100 90 20 63729 rs181483669 G [A] 100 91 20 63733 rs75670495 C [T] 100 92 20 63799rs1418258 C [T] 100 93 20 63808 rs76004960 G [C] 100 94 20 63813 rs532151719 G [A] 100 95 20 63857 rs543686274 CCTGGAAAGGATT [C] 100 96 20 63865 rs551938596 G [A] 100 97 20 63902 rs571779099 A [T] 100 98 20 63963 rs531152674 G [A] 100 99 20 63967 rs116770801 A [G] 100 10020 63977 rs199703510 C [G] 100 10120 64016 rs143263863 G [A] 100 10220 64062 rs148297240 G [A] 100 10320 64139 rs186497980 G [A, T] 100 10420 64150rs7274499 C [A] 100 10520 64151 rs190945171 C [T] 100 10620 64154 rs537656456 T [G] 100 10720 64175 rs116531220 A [G] 100 10820 64186 rs141793347 C [G] 100 10920 64210 rs182418654 G [C] 100 11020 64303 rs559929739 C [A] 100 {code} # I think this one fails because it tries to broadcast the comparison. {code} >>> mini[mini.ALTS == ["A", "T"]] Traceback (most recent call last): File "", line 1, in File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1283, in wrapper res = na_op(values, other) File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1143, in na_op result = _comp_method_OBJECT_ARRAY(op, x, y) File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1120, in _comp_method_OBJECT_ARRAY result = libops.vec_compare(x, y, op) File "pandas/_libs/ops.pyx", line 128, in pandas._libs.ops.vec_compare ValueError: Arrays were different lengths: 31 vs 2 {code} # I think this fails due to a similar reason, but the broadcasting is happening at a different place. {code} >>> mini[mini.ALTS.apply(lambda x: x == ["A", "T"])] Traceback (most recent call last): File "", line 1, in File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2682, in __getitem__ return self._getitem_array(key) File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2726, in _getitem_array indexer = self.loc._convert_to_indexer(key, axis=1) File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1314, in _convert_to_indexer indexer = check = labels.get_indexer(objarr) File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3259, in get_indexer indexer = self._engine.get_indexer(target._ndarray_values) File "pandas/_libs/index.pyx", line 301, in pandas._libs.index.IndexEngine.get_indexer File "pandas/_libs/hashtable_class_helper.pxi", line 1544, in pandas._libs.hashtable.PyObjectHashTable.lookup TypeError: unhashable type: 'numpy.ndarray' >>> mini.ALTS.apply(lambda x: x == ["A", "T"]).head() 80 [True, False] 81 [True, False] 82[False, False] 83[False, False] 84[False, False] {code} # Unfortunately this clever hack fails as well! {code} >>> c = np.empty(1, object) >>> c[0] = ["A", "T"] >>> mini[mini.ALTS.values == c] Traceback (most recent call last): File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File
[jira] [Updated] (ARROW-3792) [PARQUET] Segmentation fault when writing empty RecordBatches
[ https://issues.apache.org/jira/browse/ARROW-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-3792: -- Description: h2. Background I am trying to convert a very sparse dataset to parquet (~3% rows in a range are populated). The file I am working with spans upto ~63M rows. I decided to iterate in batches of 500k rows, 127 batches in total. Each row batch is a {{RecordBatch}}. I create 4 batches at a time, and write to a parquet file incrementally. Something like this: {code:python} batches = [..] # 4 batches tbl = pa.Table.from_batches(batches) pqwriter.write_table(tbl, row_group_size=15000) # same issue with pq.write_table(..) {code} I was getting a segmentation fault at the final step, I narrowed it down to a specific iteration. I noticed that iteration had empty batches; specifically, [0, 0, 2876, 14423]. The number of rows for each {{RecordBatch}} for the whole dataset is below: {code:python} [14050, 16398, 14080, 14920, 15527, 14288, 15040, 14733, 15345, 15799, 15728, 15942, 14734, 15241, 15721, 15255, 14167, 14009, 13753, 14800, 14554, 14287, 15393, 14766, 16600, 15675, 14072, 13263, 12906, 14167, 14455, 15428, 15129, 16141, 15478, 16257, 14639, 14887, 14919, 15535, 13973, 14334, 13286, 15038, 15951, 17252, 15883, 19903, 16967, 16878, 15845, 12205, 8761, 0, 0, 0, 0, 0, 2876, 14423, 13557, 12723, 14330, 15452, 13551, 12723, 12396, 13531, 13539, 11512, 13175, 13941, 14634, 15515, 14239, 13856, 13873, 14154, 14822, 13543, 14653, 15328, 16171, 15101, 150 55, 15194, 14058, 13706, 14747, 14650, 14694, 15397, 15122, 16055, 16635, 14153, 14665, 14781, 15462, 15426, 16150, 14632, 14532, 15139, 15324, 15279, 16075, 16394, 16834, 15391, 16320, 1650 4, 17248, 15913, 15341, 14754, 16637, 15695, 16642, 18143, 19481, 19072, 15742, 18807, 18789, 14258, 0, 0] {code} On excluding the empty {{RecordBatch}}-es, the segfault goes away, but unfortunately I couldn't create a proper minimal example with synthetic data. h2. Not quite minimal example The data I am using is from the 1000 Genome project, which has been public for many years, so we can be reasonably sure the data is good. The following steps should help you replicate the issue. # Download the data file (and index), about 330MB: {code:bash} $ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz{,.tbi} {code} # Install the Cython library {{pysam}}, a thin wrapper around the reference implementation of the VCF file spec. You will need {{zlib}} headers, but that's probably not a problem :) {code:bash} $ pip3 install --user pysam {code} # Now you can use the attached script to replicate the crash. h2. Extra information I have tried attaching gdb, the backtrace when the segfault occurs is shown below (maybe it helps, this is how I realised empty batches could be the reason). {code} (gdb) bt #0 0x7f3e7676d670 in parquet::TypedColumnWriter >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #1 0x7f3e76733d1e in arrow::Status parquet::arrow::(anonymous namespace)::ArrowColumnWriter::TypedWriteBatch, arrow::BinaryType>(arrow::Array const&, long, short const*, short const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #2 0x7f3e7673a3d4 in parquet::arrow::(anonymous namespace)::ArrowColumnWriter::Write(arrow::Array const&) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #3 0x7f3e7673df09 in parquet::arrow::FileWriter::Impl::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #4 0x7f3e7673c74d in parquet::arrow::FileWriter::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #5 0x7f3e7673c8d2 in parquet::arrow::FileWriter::WriteTable(arrow::Table const&, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #6 0x7f3e731e3a51 in __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, _object*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so {code} was: h2. Background I am trying to convert a very sparse dataset to parquet (~3% rows in a range are populated). The file I am working with spans upto ~63M rows. I decided to iterate in batches of 500k rows, 127 batches in total. Each row batch is a {{RecordBatch}}. I create 4 batches at a time, and write to a parquet file incrementally. Something like this: {code:python} batches = [..] # 4 batches tbl = pa.Table.from_batches(batches)
[jira] [Updated] (ARROW-3792) [PARQUET] Segmentation fault when writing empty RecordBatches
[ https://issues.apache.org/jira/browse/ARROW-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suvayu Ali updated ARROW-3792: -- Description: h2. Background I am trying to convert a very sparse dataset to parquet (~3% rows in a range are populated). The file I am working with spans upto ~63M rows. I decided to iterate in batches of 500k rows, 127 batches in total. Each row batch is a {{RecordBatch}}. I create 4 batches at a time, and write to a parquet file incrementally. Something like this: {code:python} batches = [..] # 4 batches tbl = pa.Table.from_batches(batches) pqwriter.write_table(tbl, row_group_size=15000) # same issue with pq.write_table(..) {code} I was getting a segmentation fault at the final step, I narrowed it down to a specific iteration. I noticed that iteration had empty batches; specifically, [0, 0, 2876, 14423]. The number of rows for each {{RecordBatch}} for the whole dataset is below: {code:python} [14050, 16398, 14080, 14920, 15527, 14288, 15040, 14733, 15345, 15799, 15728, 15942, 14734, 15241, 15721, 15255, 14167, 14009, 13753, 14800, 14554, 14287, 15393, 14766, 16600, 15675, 14072, 13263, 12906, 14167, 14455, 15428, 15129, 16141, 15478, 16257, 14639, 14887, 14919, 15535, 13973, 14334, 13286, 15038, 15951, 17252, 15883, 19903, 16967, 16878, 15845, 12205, 8761, 0, 0, 0, 0, 0, 2876, 14423, 13557, 12723, 14330, 15452, 13551, 12723, 12396, 13531, 13539, 11512, 13175, 13941, 14634, 15515, 14239, 13856, 13873, 14154, 14822, 13543, 14653, 15328, 16171, 15101, 150 55, 15194, 14058, 13706, 14747, 14650, 14694, 15397, 15122, 16055, 16635, 14153, 14665, 14781, 15462, 15426, 16150, 14632, 14532, 15139, 15324, 15279, 16075, 16394, 16834, 15391, 16320, 1650 4, 17248, 15913, 15341, 14754, 16637, 15695, 16642, 18143, 19481, 19072, 15742, 18807, 18789, 14258, 0, 0] {code} On excluding the empty {{RecordBatch}}-es, the segfault goes away, but unfortunately I couldn't create a proper minimal example with synthetic data. h2. Not quite minimal example The data I am using is from the 1000 Genome project, which has been public for many years, so we can be reasonably sure the data is good. The following steps should help you replicate the issue. # Download the data file (and index), about 330MB: {code:bash} $ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz{,.tbi} {code} # Install the Cython library {{pysam}}, a thin wrapper around the reference implementation of the VCF file spec. You will need {{zlib}} headers, but that's probably not a problem :) {code:bash} $ pip3 install --user pysam {code} # Now you can use the attached script to replicate the crash. h2. Extra information I have tried attaching gdb, the backtrace when the segfault occurs is shown below (maybe it helps, this is how I realised empty batches could be the reason). {code} (gdb) bt #0 0x7f3e7676d670 in parquet::TypedColumnWriter >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #1 0x7f3e76733d1e in arrow::Status parquet::arrow::(anonymous namespace)::ArrowColumnWriter::TypedWriteBatch, arrow::BinaryType>(arrow::Array const&, long, short const*, short const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #2 0x7f3e7673a3d4 in parquet::arrow::(anonymous namespace)::ArrowColumnWriter::Write(arrow::Array const&) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #3 0x7f3e7673df09 in parquet::arrow::FileWriter::Impl::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #4 0x7f3e7673c74d in parquet::arrow::FileWriter::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #5 0x7f3e7673c8d2 in parquet::arrow::FileWriter::WriteTable(arrow::Table const&, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #6 0x7f3e731e3a51 in __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, _object*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so {code} was: h2. Background I am trying to convert a very sparse dataset to parquet (~3% rows in a range are populated). The file I am working with spans upto ~63M rows. I decided to iterate in batches of 500k rows, 127 batches in total. Each row batch is a RecordBatch. I create 4 batches at a time, and write to a parquet file incrementally. Something like this: {code:python} batches = [..] # 4 batches tbl =
[jira] [Created] (ARROW-3792) [PARQUET] Segmentation fault when writing empty RecordBatches
Suvayu Ali created ARROW-3792: - Summary: [PARQUET] Segmentation fault when writing empty RecordBatches Key: ARROW-3792 URL: https://issues.apache.org/jira/browse/ARROW-3792 Project: Apache Arrow Issue Type: Bug Components: Format Affects Versions: 0.11.1 Environment: Fedora 28, pyarrow installed with pip Fedora 29, pyarrow installed from conda-forge Reporter: Suvayu Ali Attachments: pq-bug.py h2. Background I am trying to convert a very sparse dataset to parquet (~3% rows in a range are populated). The file I am working with spans upto ~63M rows. I decided to iterate in batches of 500k rows, 127 batches in total. Each row batch is a RecordBatch. I create 4 batches at a time, and write to a parquet file incrementally. Something like this: {code:python} batches = [..] # 4 batches tbl = pa.Table.from_batches(batches) pqwriter.write_table(tbl, row_group_size=15000) # same issue with pq.write_table(..) {code} I was getting a segmentation fault at the final step, I narrowed it down to a specific iteration. I noticed that iteration had empty batches; specifically, [0, 0, 2876, 14423]. The number of rows for each RecordBatch for the whole dataset is below: {code:python} [14050, 16398, 14080, 14920, 15527, 14288, 15040, 14733, 15345, 15799, 15728, 15942, 14734, 15241, 15721, 15255, 14167, 14009, 13753, 14800, 14554, 14287, 15393, 14766, 16600, 15675, 14072, 13263, 12906, 14167, 14455, 15428, 15129, 16141, 15478, 16257, 14639, 14887, 14919, 15535, 13973, 14334, 13286, 15038, 15951, 17252, 15883, 19903, 16967, 16878, 15845, 12205, 8761, 0, 0, 0, 0, 0, 2876, 14423, 13557, 12723, 14330, 15452, 13551, 12723, 12396, 13531, 13539, 11512, 13175, 13941, 14634, 15515, 14239, 13856, 13873, 14154, 14822, 13543, 14653, 15328, 16171, 15101, 150 55, 15194, 14058, 13706, 14747, 14650, 14694, 15397, 15122, 16055, 16635, 14153, 14665, 14781, 15462, 15426, 16150, 14632, 14532, 15139, 15324, 15279, 16075, 16394, 16834, 15391, 16320, 1650 4, 17248, 15913, 15341, 14754, 16637, 15695, 16642, 18143, 19481, 19072, 15742, 18807, 18789, 14258, 0, 0] {code} On excluding the empty RecordBatch-es, the segfault goes away, but unfortunately I couldn't create a proper minimal example with synthetic data. h2. Not quite minimal example The data I am using is from the 1000 Genome project, which has been public for many years, so we can be reasonably sure the data is good. The following steps should help you replicate the issue. # Download the data file (and index), about 330MB: {code:bash} $ wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz{,.tbi} {code} # Install the Cython library pysam, a thin wrapper around the reference implementation of the VCF file spec. You will need zlib headers, but that's probably not a problem :) {code:bash} $ pip3 install --user pysam {code} # Now you can use the attached script to replicate the crash. h2. Extra information I have tried attaching gdb, the backtrace when the segfault occurs is shown below (maybe it helps, this is how I realised empty batches could be the reason). {code} (gdb) bt #0 0x7f3e7676d670 in parquet::TypedColumnWriter >::WriteMiniBatch(long, short const*, short const*, parquet::ByteArray const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #1 0x7f3e76733d1e in arrow::Status parquet::arrow::(anonymous namespace)::ArrowColumnWriter::TypedWriteBatch, arrow::BinaryType>(arrow::Array const&, long, short const*, short const*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #2 0x7f3e7673a3d4 in parquet::arrow::(anonymous namespace)::ArrowColumnWriter::Write(arrow::Array const&) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #3 0x7f3e7673df09 in parquet::arrow::FileWriter::Impl::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #4 0x7f3e7673c74d in parquet::arrow::FileWriter::WriteColumnChunk(std::shared_ptr const&, long, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #5 0x7f3e7673c8d2 in parquet::arrow::FileWriter::WriteTable(arrow::Table const&, long) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.11 #6 0x7f3e731e3a51 in __pyx_pw_7pyarrow_8_parquet_13ParquetWriter_5write_table(_object*, _object*, _object*) () from /home/user/miniconda3/lib/python3.6/site-packages/pyarrow/_parquet.cpython-36m-x86_64-linux-gnu.so {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1956) Support reading specific partitions from a partitioned parquet dataset
[ https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306537#comment-16306537 ] Suvayu Ali commented on ARROW-1956: --- Hi Wes, Inspired by the way PySpark does it, I propose the following. * Writing partitioned datasets: {code:none} writer = PartitionedParquetWriter(basepath, partitions, schema, ...) {code} Rest of the arguments could be identical to ParquetWriter. For that matter, we can also have: {code:java} writer = ParquetWriter(where, ..., compression='snappy', partitions=[]) {code} For a single file, all constructor arguments are as it is currently, and `partitions` is ignored, however when `where` is a directory, `partitions` must be a list of column names to partition on. * Reading partitioned datasets: {code:java} dst = ParquetDataset(path_or_paths, validate_schema=True, basepath=None) {code} When `basepath` is `None`, we have the current behaviour, whereas if `basepath` is a path, directory hierarchies are detected in `path_or_paths`, and each sub-directory is treated as a parquet partition in the usual fashion. What do you think? If there is someone to provide guidance, I can also work on the implementation. I have lots of free time from the second week of January. Thanks, > Support reading specific partitions from a partitioned parquet dataset > -- > > Key: ARROW-1956 > URL: https://issues.apache.org/jira/browse/ARROW-1956 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Affects Versions: 0.8.0 > Environment: Kernel: 4.14.8-300.fc27.x86_64 > Python: 3.6.3 >Reporter: Suvayu Ali >Priority: Minor > Labels: parquet > Fix For: 0.9.0 > > Attachments: so-example.py > > > I want to read specific partitions from a partitioned parquet dataset. This > is very useful in case of large datasets. I have attached a small script > that creates a dataset and shows what is expected when reading (quoting > salient points below). > # There is no way to read specific partitions in Pandas > # In pyarrow I tried to achieve the goal by providing a list of > files/directories to ParquetDataset, but it didn't work: > # In PySpark it works if I simply do: > {code:none} > spark.read.options('basePath', 'datadir').parquet(*list_of_partitions) > {code} > I also couldn't find a way to easily write partitioned parquet files. In the > end I did it by hand by creating the directory hierarchies, and writing the > individual files myself (similar to the implementation in the attached > script). Again, in PySpark I can do > {code:none} > df.write.partitionBy(*list_of_partitions).parquet(output) > {code} > to achieve that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)