[jira] [Comment Edited] (ARROW-7656) [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type Inference

Joris Van den Bossche (Jira) Tue, 28 Jan 2020 02:40:25 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025029#comment-17025029
 ]


Joris Van den Bossche edited comment on ARROW-7656 at 1/28/20 10:39 AM:
------------------------------------------------------------------------

The docs say:

> Map column names to column types (disabling type inference on those columns).

So it only disables type inference for the column for which you specified a 
type in {{column_types}}.

Now, assuming you are talking here about the same example as you showed in 
https://issues.apache.org/jira/browse/ARROW-7655, by removing {{read_options}} 
from your call, you are no longer giving the column in your data a name. And 
hence, the column (with inferred name "0") is not specified in 
{{column_types}}, and hence is still inferreded to be integer:

{code:python}
>>> table = csv.read_csv("test.csv", convert_options=convert_options)#, 
>>> read_options=read_options)
>>> table.schema
0: int64
>>> table.to_pandas()
   0
0  1
{code}

If I am misinterpreting your use case, can you provide a full reproducible 
example for this issue?



was (Author: jorisvandenbossche):
The docs say

> Map column names to column types (disabling type inference on those columns).

So it only disables type inference for the column for which you specified a 
type in {{column_types}}.

Now, assuming you are talking here about the same example as you showed in 
https://issues.apache.org/jira/browse/ARROW-7655, by removing {{read_options}} 
from your call, you are no longer giving the column in your data a name. And 
hence, the column (with inferred name "0") is not specified in 
{{column_types}}, and hence is still inferreded to be integer:

{code:python}
>>> table = csv.read_csv("test.csv", convert_options=convert_options)#, 
>>> read_options=read_options)
>>> table.schema
0: int64
>>> table.to_pandas()
   0
0  1
{code}

If I am misinterpreting your use case, can you provide a full reproducible 
example for this issue?


> [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type 
> Inference
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-7656
>                 URL: https://issues.apache.org/jira/browse/ARROW-7656
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>         Environment: Documentation, N/A.
>            Reporter: Tim Lantz
>            Priority: Minor
>              Labels: CSV
>
> High level description:
>  * The documentation 
> [here|[https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]]
>  says that setting column_types disables type inference.
>  * Under the hood I can see why it is clear you need to also set 
> ReadOptions.column_names to support all current use cases however it is 
> unclear to new users of the library when you read the docs. Especially since 
> you can supply a Schema object to column_types in the Python bindings.
>  * Suggested change: update the csv.ConvertOptions to note that you also must 
> set csv.ReadOptions.column_names in order to disable type inference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-7656) [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type Inference

Reply via email to