[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438623#comment-16438623
 ] 

ASF GitHub Bot commented on ARROW-2101:
---------------------------------------

xhochy commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert 
numpy arrays of bytes to arrow arrays of strings when user specifies arrow type 
of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381388117
 
 
   Not sure if there were more comment on it, but just want to iterate on 
   
   >> Also, this doesn't change anything for Python 2 if using 'str' objects 
and the type is not specified, it will still create a BinaryArray, is this what 
we want?
   
   > Probably. Python 2 str objects are bytestrings just like Python 3 bytes 
objects.
   
   Yes this is definitely the indented behaviour. We had some discussion in the 
past about it and stuck to the following when no type is specified:
   
   ```
   str(PY2) / bytes(PY3) –> pa.binary
   unicode(PY2) / str(PY3) –> pa.string
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> ------------------------------------------------------------------------
>
>                 Key: ARROW-2101
>                 URL: https://issues.apache.org/jira/browse/ARROW-2101
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>              Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to