alamb commented on code in PR #7285:
URL: https://github.com/apache/arrow-rs/pull/7285#discussion_r1996378155
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -1455,17 +1455,56 @@ mod tests {
#[test]
fn test_int96_single_column_reader_test() {
let encodings = &[Encoding::PLAIN, Encoding::RLE_DICTIONARY];
- run_single_column_reader_tests::<Int96Type, _, Int96Type>(
- 2,
- ConvertedType::NONE,
- None,
- |vals| {
+
+ let resolutions: Vec<(Option<ArrowDataType>, fn(&[Option<Int96>]) ->
ArrayRef)> = vec![
+ (None, |vals: &[Option<Int96>]| {
Arc::new(TimestampNanosecondArray::from_iter(
vals.iter().map(|x| x.map(|x| x.to_nanos())),
- )) as _
- },
- encodings,
- );
+ )) as ArrayRef
+ }),
+ (
+ Some(ArrowDataType::Timestamp(TimeUnit::Second, None)),
+ |vals: &[Option<Int96>]| {
+ Arc::new(TimestampSecondArray::from_iter(
+ vals.iter().map(|x| x.map(|x| x.to_seconds())),
+ )) as ArrayRef
+ },
+ ),
+ (
+ Some(ArrowDataType::Timestamp(TimeUnit::Millisecond, None)),
+ |vals: &[Option<Int96>]| {
+ Arc::new(TimestampMillisecondArray::from_iter(
+ vals.iter().map(|x| x.map(|x| x.to_millis())),
+ )) as ArrayRef
+ },
+ ),
+ (
+ Some(ArrowDataType::Timestamp(TimeUnit::Microsecond, None)),
+ |vals: &[Option<Int96>]| {
+ Arc::new(TimestampMicrosecondArray::from_iter(
+ vals.iter().map(|x| x.map(|x| x.to_micros())),
+ )) as ArrayRef
+ },
+ ),
+ (
+ Some(ArrowDataType::Timestamp(TimeUnit::Nanosecond, None)),
Review Comment:
can you please also add test coverage for a timestamp with a timezone too?
Specifically both utc and something that is not UTC?
(all these tests use `None`)
##########
parquet/src/record/api.rs:
##########
@@ -701,7 +701,7 @@ impl Field {
/// `Timestamp` value.
#[inline]
pub fn convert_int96(_descr: &ColumnDescPtr, value: Int96) -> Self {
- Field::TimestampMillis(value.to_i64())
+ Field::TimestampMillis(value.to_millis())
Review Comment:
I think this should be to_nanos() to preserve the old behavior
But then again it doesn't make sense to erturn a nanosecond timestamp for a
value with millisecond precision 🤔
##########
parquet/src/data_type.rs:
##########
@@ -56,31 +74,55 @@ impl Int96 {
self.value = [elem0, elem1, elem2];
}
- /// Converts this INT96 into an i64 representing the number of
MILLISECONDS since Epoch
- pub fn to_i64(&self) -> i64 {
Review Comment:
I think we need to leave to_i64 in order to avoid breaking the public API.
You can see it is here
https://docs.rs/parquet/latest/parquet/data_type/struct.Int96.html
We can put it back and mark it as deprecated:
https://github.com/apache/arrow-rs?tab=readme-ov-file#deprecation-guidelines
Likewise for to_second_and_nanoseconds
##########
parquet/src/arrow/array_reader/primitive_array.rs:
##########
@@ -37,13 +40,13 @@ use std::sync::Arc;
/// Provides conversion from `Vec<T>` to `Buffer`
pub trait IntoBuffer {
- fn into_buffer(self) -> Buffer;
+ fn into_buffer(self, target_type: &ArrowType) -> Buffer;
}
macro_rules! native_buffer {
($($t:ty),*) => {
$(impl IntoBuffer for Vec<$t> {
- fn into_buffer(self) -> Buffer {
+ fn into_buffer(self, _target_type: &ArrowType) -> Buffer {
Review Comment:
👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]