[
https://issues.apache.org/jira/browse/ARROW-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025029#comment-17025029
]
Joris Van den Bossche edited comment on ARROW-7656 at 1/28/20 10:39 AM:
------------------------------------------------------------------------
The docs say:
> Map column names to column types (disabling type inference on those columns).
So it only disables type inference for the column for which you specified a
type in {{column_types}}.
Now, assuming you are talking here about the same example as you showed in
https://issues.apache.org/jira/browse/ARROW-7655, by removing {{read_options}}
from your call, you are no longer giving the column in your data a name. And
hence, the column (with inferred name "0") is not specified in
{{column_types}}, and hence is still inferreded to be integer:
{code:python}
>>> table = csv.read_csv("test.csv", convert_options=convert_options)#,
>>> read_options=read_options)
>>> table.schema
0: int64
>>> table.to_pandas()
0
0 1
{code}
If I am misinterpreting your use case, can you provide a full reproducible
example for this issue?
was (Author: jorisvandenbossche):
The docs say
> Map column names to column types (disabling type inference on those columns).
So it only disables type inference for the column for which you specified a
type in {{column_types}}.
Now, assuming you are talking here about the same example as you showed in
https://issues.apache.org/jira/browse/ARROW-7655, by removing {{read_options}}
from your call, you are no longer giving the column in your data a name. And
hence, the column (with inferred name "0") is not specified in
{{column_types}}, and hence is still inferreded to be integer:
{code:python}
>>> table = csv.read_csv("test.csv", convert_options=convert_options)#,
>>> read_options=read_options)
>>> table.schema
0: int64
>>> table.to_pandas()
0
0 1
{code}
If I am misinterpreting your use case, can you provide a full reproducible
example for this issue?
> [Python] csv.ConvertOptions Documentation Is Unclear Around Disabling Type
> Inference
> ------------------------------------------------------------------------------------
>
> Key: ARROW-7656
> URL: https://issues.apache.org/jira/browse/ARROW-7656
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Environment: Documentation, N/A.
> Reporter: Tim Lantz
> Priority: Minor
> Labels: CSV
>
> High level description:
> * The documentation
> [here|[https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions]]
> says that setting column_types disables type inference.
> * Under the hood I can see why it is clear you need to also set
> ReadOptions.column_names to support all current use cases however it is
> unclear to new users of the library when you read the docs. Especially since
> you can supply a Schema object to column_types in the Python bindings.
> * Suggested change: update the csv.ConvertOptions to note that you also must
> set csv.ReadOptions.column_names in order to disable type inference.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)