[ 
https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272271#comment-17272271
 ] 

Lance Dacey commented on ARROW-11390:
-------------------------------------

Actually, turbodbc would have been installed before pyarrow since version 3.0 
was not on conda-forge so I moved it down to the pip section. Do I need to 
reverse this installation process?


{code:java}
    && /opt/conda/bin/conda install -c conda-forge -yq \
    pandas \
    numpy \
    pyodbc \
    pybind11 \
    turbodbc \
    azure-storage-blob \
    azure-storage-common \
    xlrd \
    openpyxl \
    mysql-connector-python \ 
    zeep \
    xmltodict \
    dask \
    dask-labextension \
    pymssql=2.1 \
    sqlalchemy-redshift \
    python-snappy \
    seaborn \
    python-gitlab \
    pyxlsb \
    humanfriendly \
    jupyterlab \
    notebook=6.1.4 \
    pip \
    && /opt/conda/bin/pip install --no-cache-dir --upgrade pip \
                                            smartsheet-python-sdk \
                                            duo-client \
                                            adlfs \
                                            pyarrow \
                                            
"apache-airflow[postgres,redis,celery,crypto,ssh,password]==$AIRFLOW_VERSION" \
{code}


I have not been able to get turbodbc to work with pip which is why I am using 
conda right now. Actually I was just trying to get it to work again using a 
CFLAGS argument "-D_GLIBCXX_USE_CXX11_ABI=0", but had no luck. I will attempt 
some more and perhaps raise an issue on the turbodbc project though.

Let me know if there is a proper way to install these libraries! (ideally with 
just plain pip, since my base image is from Airflow which does not use conda by 
default)







> [Python] pyarrow 3.0 issues with turbodbc
> -----------------------------------------
>
>                 Key: ARROW-11390
>                 URL: https://issues.apache.org/jira/browse/ARROW-11390
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 3.0.0
>         Environment: pyarrow 3.0.0
> fsspec 0.8.4
> adlfs v0.5.9
> pandas 1.2.1
> numpy 1.19.5
> turbodbc 4.1.1
>            Reporter: Lance Dacey
>            Priority: Major
>              Labels: python, turbodbc
>
> This is more of a turbodbc issue I think, but perhaps someone here would have 
> some idea of what changed to cause potential issues. 
> {code:java}
> cursor = connection.cursor()
> cursor.execute("select top 10 * from dbo.tickets")
> table = cursor.fetchallarrow(){code}
> I am able to run table.num_rows and it will print out 10.
> If I run table.to_pandas() or table.schema or try to write the table to a 
> dataset, my kernel dies with no explanation. I reverted back to pyarrow 2.0 
> and the same code works again.
> [https://github.com/blue-yonder/turbodbc/issues/289]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to