zeddit opened a new issue, #45503:
URL: https://github.com/apache/arrow/issues/45503
### Describe the enhancement requested
In my case, my csv has date/datetime fields like `20250210`(pa.date32()),
`2025021106000062`(pa.timestamp('ms')), which cannot be converted smoothly.
Till now, csv.read_csv cannot recognize date32 data type and cannot convert
fractional seconds e.g. miliseconds. what I should do is using pandas like
below:
1. change my schema to pa.string() for those date/datetime fields
2. read_csv the input file
3. manually convert the data type to the one I need, e.g.
`pa.array(pd.to_datetime(table['date'], format='%Y%m%d')).cast(pa.date32(),
safe=True)`, and `pa.array(pd.to_datetime(table['timestamp_ms'],
format='%Y%m%d%H%M%S%f')).cast(pa.timestamp('ms'), safe=True)`
4. assemble the arrow table I need with
pa.Table.from_arrays([converted_arrays, part_of_original_table_arrays],
schema=original_schema)
which would be quite inefficient, is there any other method to boost this
way? thanks
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]