[jira] [Commented] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

Rok Mihevc (Jira) Tue, 10 Jan 2023 23:27:55 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17659433#comment-17659433
 ]


Rok Mihevc commented on ARROW-2406:
-----------------------------------

This issue has been migrated to [issue 
#18470|https://github.com/apache/arrow/issues/18470] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Segfault when creating PyArrow table from Pandas for empty string 
> column when schema provided
> ------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2406
>                 URL: https://issues.apache.org/jira/browse/ARROW-2406
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: Mac OS High Sierra
> Python 3.6.3
>            Reporter: Dave Challis
>            Priority: Major
>             Fix For: 0.9.0
>
>
> Minimal example to recreate:
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema){code}
>  
> This causes the python interpreter to exit with "Segmentation fault: 11".
> The following examples all work without any issue:
> {code}
> # column 'a' is no longer empty
> df = pd.DataFrame({'a': ['foo']})
> df['a'] = df['a'].astype(str)
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> {code}
> # column 'a' is empty, but no schema is specified
> df = pd.DataFrame({'a': []})
> df['a'] = df['a'].astype(str)
> pa.Table.from_pandas(df)
> {code}
> {code}
> # column 'a' is empty, but no type 'str' specified in Pandas
> df = pd.DataFrame({'a': []})
> schema = pa.schema([pa.field('a', pa.string())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-2406) [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

Reply via email to