The following code dies with pyarrow 0.14.2:
import pyarrow as pa
import pyarrow.parquet as pq
schema = pa.schema([('timestamp', pa.timestamp('ns', tz='UTC')),])
writer = pq.ParquetWriter('foo.parquet', schema, coerce_timestamps='ns')
ts_array = pa.array([ int(1234567893141) ], type=pa.timestamp('ns',
tz='UTC'))
table = pa.Table.from_arrays([ ts_array ], names=['timestamp'])
writer.write_table(table)
writer.close()
with the message:
ValueError: Invalid value for coerce_timestamps: ns
That appears to be because of this code in _parquet.pxi:
cdef int _set_coerce_timestamps(
self, ArrowWriterProperties.Builder* props) except -1:
if self.coerce_timestamps == 'ms':
props.coerce_timestamps(TimeUnit_MILLI)
elif self.coerce_timestamps == 'us':
props.coerce_timestamps(TimeUnit_MICRO)
elif self.coerce_timestamps is not None:
raise ValueError('Invalid value for coerce_timestamps: {0}'
.format(self.coerce_timestamps))
which restricts the choice to 'ms' or 'us', even though AFAICT everywhere
else also allows 'ns' (and there is a TimeUnit_NANO defined). Is this
intentional, or a bug?
Thanks,
- db