alex-d-jensen opened a new issue, #45640:
URL: https://github.com/apache/arrow/issues/45640
### Describe the bug, including details regarding any error messages,
version, and platform.
## Sample code to demonstrate:
```python
import pandas as pd
import pyarrow as pa
pandas_dataframe = pd.DataFrame(
{
"x_simple_col": [123],
"struct_col": [
{
"col9": "a_string",
"col1": True,
"a_nested_struct": {
"field": 1,
'a_field': 2
},
"b_array": ["cheese"],
},
],
}
)
pyarrow_schema = pa.schema(
fields=[
pa.field("x_simple_col", pa.int64()),
pa.field(
name="struct_col",
type=pa.struct(
fields=[
pa.field(name="col9", type=pa.string()),
pa.field(name="col1", type=pa.bool_()),
pa.field(
name="a_nested_struct",
type=pa.struct(
fields=[
pa.field(name="field", type=pa.int64()),
pa.field(name="a_field", type=pa.int64()),
]
),
),
pa.field(name="b_array",
type=pa.list_(value_type=pa.string())),
]
),
),
]
)
inferred_schema = pa.Schema.from_pandas(pandas_dataframe)
assert pyarrow_schema == inferred_schema
pyarrow_schema
inferred_schema
```
Gives output:
```
>>> assert pyarrow_schema == inferred_schema
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>>
>>> pyarrow_schema
x_simple_col: int64
struct_col: struct<col9: string, col1: bool, a_nested_struct: struct<field:
int64, a_field: int64>, b_array: list<item: string>>
child 0, col9: string
child 1, col1: bool
child 2, a_nested_struct: struct<field: int64, a_field: int64>
child 0, field: int64
child 1, a_field: int64
child 3, b_array: list<item: string>
child 0, item: string
>>> inferred_schema
x_simple_col: int64
struct_col: struct<a_nested_struct: struct<a_field: int64, field: int64>,
b_array: list<item: string>, col1: bool, col9: string>
child 0, a_nested_struct: struct<a_field: int64, field: int64>
child 0, a_field: int64
child 1, field: int64
child 1, b_array: list<item: string>
child 0, item: string
child 2, col1: bool
child 3, col9: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' +
499
```
## Expected result:
inferred_schema and pyarrow_schema should match, including struct fields
order (given that documentation for structs mentions that fields are ordered
and that order matters when comparing schemas).
## Actual result:
Schema for structs (including nested structs/fields in structs inside
structs etc) has fields in alphabetical order, rather than in the order found
the data which the schema is inferred from via `from_pandas`.
Regular columns stay in given order - this only affects fields in structs.
## System info:
output from sw_vers:
ProductName: macOS
ProductVersion: 15.3.1
BuildVersion: 24D70
pyarrow version: 18.1.0 (also tried on 19.0.1).
pandas version: 2.2.3
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]