msalib opened a new issue, #1932:
URL: https://github.com/apache/arrow-rs/issues/1932
**Describe the bug**
I cannot figure out how to write a parquet file with a timestamp column that
gets encoded as UTC. All my efforts produce files with naive timestamps and no
UTC metadata.
**To Reproduce**
Consider this program: it writes a tiny parquet file to `/tmp/q.parquet`.
But using both `pqrs` and `pandas/pyarrow` on the resulting file shows that
there is no timezone present -- the metric_date column is a naive timestamp.
```rust
use std::sync::Arc;
use arrow::{
array::{StringArray, TimestampMillisecondArray},
datatypes::{DataType, Field, Schema, TimeUnit},
record_batch::RecordBatch,
};
use parquet::{
arrow::arrow_writer::ArrowWriter,
file::properties::{WriterProperties, WriterVersion},
};
fn main() {
//let tz = Some("UTC".to_owned());
let tz = None;
let fields = vec![
Field::new(
"metric_date",
DataType::Timestamp(TimeUnit::Millisecond, tz.clone()),
false,
),
Field::new("my_id", DataType::Utf8, false),
];
let schema = Arc::new(Schema::new(fields));
let my_ids = Arc::new(StringArray::from(vec!["hi", "there"]));
let dates = Arc::new(TimestampMillisecondArray::from_vec(
vec![1234532523, 1234124],
tz,
));
let batch = RecordBatch::try_new(schema.clone(), vec![dates,
my_ids]).unwrap();
let f = std::fs::File::create("/tmp/q.parquet").unwrap();
let props = WriterProperties::builder()
.set_writer_version(WriterVersion::PARQUET_2_0)
.build();
let mut writer = ArrowWriter::try_new(f, schema, Some(props)).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
println!("Hello, world!");
}
```
**Additional context**
Tested using arrow="16.0.0" and parquet="16.0.0".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]