[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106168#comment-16106168 ] Li Jin commented on ARROW-1291: --- I think it's ok to not maintain "roundtrip exact conversion" between Arrow and other data representation. It's inevitable that other data representation has some exotic feature that Arrow cannot support, it's a little bit too strict in my opinion to error out in all cases. Just to provide another data point, (not saying this is correct, just for reference), Spark/Pandas conversion also casts int column names to string. > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106155#comment-16106155 ] Wes McKinney commented on ARROW-1291: - PR: https://github.com/apache/arrow/pull/911 > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106151#comment-16106151 ] Wes McKinney commented on ARROW-1291: - I'm more in favor of #2, mostly because renaming the columns on a DataFrame without destroying the original object will generally involve a memory doubling. You can assign to {{df.columns}} to avoid this, but {{df.rename(columns=str)}} will double memory > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105728#comment-16105728 ] Phillip Cloud commented on ARROW-1291: -- That could work, but then the round trip conversion is no longer exact. It seems like the choice is "where should the surprise be?" or maybe "what's least surprising to users?" and that there are three options. # Leave the behavior as is, and users of arrow need to handle their own input columns before sending dataframes to arrow. This is the current behavior. # Add casting to strings in one direction, when the input is a dataframe with numeric columns. This gives IMO behavior that is more surprising than an error: when you call {{.to_pandas()}} you get back something different than what you put in. It's also not easy to tell that it's different by looking at the dataframe because of the way dataframes repr. # Add enough metadata in to preserve the current round trip behavior. I favor #1 the most and #3 if we decide it really is necessary to allow numeric columns. With 3 we still lose some compatibility with other systems that want to read and write data that came from dataframes unless those systems want to handle integer columns. I think #2 isn't a great option because it results in behavior in the public API that isn't obvious unless you know something about how both arrow and pandas work. Additionally, we can't just call {{str}} on every column and be done, we have to make additional decisions like do we allow mixed string and integer column names? Though, maybe that's a red herring and we can just say "{{Int64Index}} s only" though we still have to make that decision as well. > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105716#comment-16105716 ] Li Jin commented on ARROW-1291: --- +1 > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105640#comment-16105640 ] Wes McKinney commented on ARROW-1291: - How about we convert non-string column labels to strings for now and wait and see if it becomes a real need to preserve the original labels on the back? I think efforts beyond that may fall into the YAGNI category for the moment. > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > Fix For: 0.6.0 > > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105491#comment-16105491 ] Li Jin commented on ARROW-1291: --- The use case I have is that I am passing a user provided pandas dataframe to Spark using Arrow. In my particular case, I don't care about the name of the column in the pandas DataFrame because the column names are defined in the Spark's schema, so it's weird to ask for people to write out their column names in pandas and just to throw them away later... I think it's more friendly behavior that to cast numeric columns to string than to throw this exception. My use case is a bit special that I don't care about the column names, so I could do the casting in my code. But I think other user might also find the current behavior surprising. I agree it's probably not worth it for arrow to preserve the numeric column names. > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105308#comment-16105308 ] Phillip Cloud commented on ARROW-1291: -- I'm -1 on allowing numeric column names since it adds an IMO unnecessary coupling to pandas semantics. With such a change, any tool that wants to read data out of an arrow array must now consider the origin of the data's column names, and cannot simply assume that the columns in the schema are always a simple list of strings. I don't think it's easy to make this behavior transparent to tools that use arrow, while OTOH a list of strings is easy to deal with in pretty much any system that arrow is a part of or will be a part of. Since this is really only useful when doing pandas -> arrow -> pandas, and users of pandas can already refer to columns by positional index with {{.iloc}} I'm not convinced we should allow this. I think adding metadata for indexes has less far-reaching effects because it's an optional feature of pandas that isn't a core part of arrow, while column names are non-negotiable. I don't think it's too much to ask people to explicitly write out their column names as strings. I *am* willing to be convinced though :) > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105288#comment-16105288 ] Li Jin commented on ARROW-1291: --- I think stringifying non-string columns is fine. Having metadata containing the original column labels sounds good but I feel it will likely to get lost somewhere because other systems, for instance, Spark SQL, does not support non-string column labels. > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1291) [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric column names
[ https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105153#comment-16105153 ] Wes McKinney commented on ARROW-1291: - This is a known limitation because Arrow schemas must have all string field names. We might consider a default casting behavior (like stringifying non-string columns), since it's better than failing. We can always choose to persist the original column labels (pickled, if necessary) in the schema metadata cc [~cpcloud] > [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric > column names > -- > > Key: ARROW-1291 > URL: https://issues.apache.org/jira/browse/ARROW-1291 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.5.0 >Reporter: Li Jin >Priority: Minor > > {code} > import pyarrow as pa > import pandas as pd > df = pd.DataFrame([1]) > pa.RecordBatch.from_pandas(df) > {code} > Exception: > {code} > TypeError Traceback (most recent call last) > in () > 3 > 4 df = pd.DataFrame([1]) > > 5 pa.RecordBatch.from_pandas(df) > table.pxi in pyarrow.lib.RecordBatch.from_pandas() > table.pxi in pyarrow.lib._dataframe_to_arrays() > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in construct_metadata(df, index_levels, preserve_index, types) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in (.0) > 187 arrow_type=arrow_type > 188 ) > --> 189 for name, arrow_type in zip(df.columns, df_types) > 190 ] + ( > 191 [ > /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in get_column_metadata(column, name, arrow_type) > 125 raise TypeError( > 126 'Column name must be a string. Got column {} of type > {}'.format( > --> 127 name, type(name).__name__ > 128 ) > 129 ) > TypeError: Column name must be a string. Got column 0 of type int64 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)