[jira] [Resolved] (ARROW-15141) [C++] Fatal error condition occurred in aws_thread_launch

Uwe Korn (Jira) Sun, 19 Dec 2021 12:09:06 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Korn resolved ARROW-15141.
------------------------------
    Resolution: Fixed

Fixed by
 * [https://github.com/conda-forge/arrow-cpp-feedstock/pull/637]
 * [https://github.com/conda-forge/arrow-cpp-feedstock/pull/638]
 * [https://github.com/conda-forge/arrow-cpp-feedstock/pull/639]
 * [https://github.com/conda-forge/arrow-cpp-feedstock/pull/640]

> [C++] Fatal error condition occurred in aws_thread_launch
> ---------------------------------------------------------
>
>                 Key: ARROW-15141
>                 URL: https://issues.apache.org/jira/browse/ARROW-15141
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 6.0.0, 6.0.1
>         Environment: - `uname -a`:
> Linux datalab2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 
> x86_64 x86_64 x86_64 GNU/Linux
> - `mamba list | grep -i "pyarrow\|tensorflow\|^python"`
> pyarrow                   6.0.0           py39hff6fa39_1_cpu    conda-forge
> python                    3.9.7           hb7a2778_3_cpython    conda-forge
> python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
> python-flatbuffers        1.12               pyhd8ed1ab_1    conda-forge
> python-irodsclient        1.0.0              pyhd8ed1ab_0    conda-forge
> python-rocksdb            0.7.0            py39h7fcd5f3_4    conda-forge
> python_abi                3.9                      2_cp39    conda-forge
> tensorflow                2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-base           2.6.2           cuda112py39h7de589b_0    conda-forge
> tensorflow-estimator      2.6.2           cuda112py39h9333c2f_0    conda-forge
> tensorflow-gpu            2.6.2           cuda112py39h0bbbad9_0    conda-forge
>            Reporter: F. H.
>            Assignee: Uwe Korn
>            Priority: Major
>
> Hi, I am getting randomly the following error when first running inference 
> with a Tensorflow model and then writing the result to a `.parquet` file:
> {code}
> Fatal error condition occurred in 
> /home/conda/feedstock_root/build_artifacts/aws-c-io_1633633131324/work/source/event_loop.c:72:
>  aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
> el_group, &thread_options) == AWS_OP_SUCCESS
> Exiting Application
> ################################################################################
> Stack trace:
> ################################################################################
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_backtrace_print+0x59)
>  [0x7ffb14235f19]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_fatal_assert+0x48)
>  [0x7ffb14227098]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0x10a43)
>  [0x7ffb1406ea43]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
>  [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././././libaws-c-io.so.1.0.0(+0xe35a)
>  [0x7ffb1406c35a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1(aws_ref_count_release+0x1d)
>  [0x7ffb14237fad]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../../././libaws-crt-cpp.so(_ZN3Aws3Crt2Io15ClientBootstrapD1Ev+0x3a)
>  [0x7ffb142a2f5a]
> /home/<user>/miniconda3/envs/spliceai_env/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so(+0x5f570)
>  [0x7ffb147fd570]
> /lib/x86_64-linux-gnu/libc.so.6(+0x49a27) [0x7ffb17f7da27]
> /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7ffb17f7dbe0]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7ffb17f5b0ba]
> /home/<user>/miniconda3/envs/spliceai_env/bin/python3.9(+0x20aa51) 
> [0x562576609a51]
> /bin/bash: line 1: 2341494 Aborted                 (core dumped)
> {code}
> My colleague ran into the same issue on Centos 8 while running the same job + 
> same environment on SLURM, so I guess it could be some issue with tensorflow 
> + pyarrow.
> Also I found a github issue with multiple people running into the same issue:
> [https://github.com/huggingface/datasets/issues/3310]
>  
> It would be very important to my lab that this bug gets resolved, as we 
> cannot work with parquet any more. Unfortunately, we do not have the 
> knowledge to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (ARROW-15141) [C++] Fatal error condition occurred in aws_thread_launch

Reply via email to