johnasiano opened a new issue, #45177:
URL: https://github.com/apache/arrow/issues/45177
### Describe the bug, including details regarding any error messages,
version, and platform.
```
import pandas as pd
import pyarrow as pa
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame({
"product_id": pd.Series(
["PROD_" + str(np.random.randint(1000, 9999)) for _ in range(100)],
dtype=pd.StringDtype(storage="pyarrow")
),
"transaction_timestamp": pd.date_range(
start=datetime.now() - timedelta(days=30),
periods=100,
freq='1H'
),
"sales_amount": pd.Series(
np.round(np.random.normal(500, 150, 100), 2),
dtype=pd.Float64Dtype()
),
"customer_segment": pd.Series(
np.random.choice(['Premium', 'Standard', 'Basic'], 100),
dtype=pd.StringDtype(storage="pyarrow")
),
"is_repeat_customer": pd.Series(
np.random.choice([True, False], 100, p=[0.3, 0.7])
)
})
def types_mapper(pa_type):
if pa_type == pa.string():
return pd.StringDtype("pyarrow")
df = df.convert_dtypes(dtype_backend="pyarrow")
df_pa = pa.Table.from_pandas(df).to_pandas(types_mapper=types_mapper)
pd.testing.assert_frame_equal(df, df_pa)
```
The dtypes are seemingly the same but I get the following error.
```
AssertionError: Attributes of DataFrame.iloc[:, 0] (column
name="product_id") are different
Attribute "dtype" are different
[left]: string[pyarrow]
[right]: string[pyarrow]
```
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]