AlenkaF commented on issue #47592:
URL: https://github.com/apache/arrow/issues/47592#issuecomment-3312003727

   Here you are mixing a single-file metadata write (which also stores pandas 
metadata) with a hive-partitioned read, and that is causing the issue. To make 
the example work, both the write and the read need to use partitions:
   
   ```python
   >>> df = pd.DataFrame({
   ...     "0": [1, 2, 3],
   ...     "1": [4, 5, 6],
   ...     "run_date": ["2025-09-17","2025-09-17","2025-09-17"]
   ... })
   ... df.to_parquet("./test-pd-data", partition_cols=["run_date"])
   ... 
   >>> 
   >>> from pyarrow.dataset import dataset
   ... ds = dataset("test-pd-data", format="parquet", partitioning="hive")
   ... table = ds.to_table()
   ... print(table.schema)
   ... 
   0: int64
   1: int64
   run_date: string
   -- schema metadata --
   pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 
620
   >>> table.to_pandas()
   ... 
      0  1    run_date
   0  1  4  2025-09-17
   1  2  5  2025-09-17
   2  3  6  2025-09-17
   ```
   
   Just to confirm, are you only seeing issues on the read side? I’m a little 
confused about the exact problem, so a clearer description and the error output 
would really help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to