gustavoatt commented on issue #1138:
URL: https://github.com/apache/iceberg/issues/1138#issuecomment-675155272
> Currently, there is no adjustment to int96 timestamp values. And I believe
that any adjustment Spark makes is based on the current session time zone. If
there needs to be an adjustment to imported int96 timestamps for Impala or any
other writer that was incorrect, then I think it makes sense to add a static
offset for all int96 values, assuming that all of them were written the same
way.
>
> I'm happy to not add this if no one needs it, but I think it is the
remaining piece to solve any problems that might come up.
Yes, Spark makes the adjustment based on the current session timezone:
```scala
// PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone conversions to
int96 timestamps'
// *only* if the file was created by something other than "parquet-mr", so
check the actual
// writer here for this file. We have to do this per-file, as each file in
the table may
// have different writers.
// Define isCreatedByParquetMr as function to avoid unnecessary parquet
footer reads.
def isCreatedByParquetMr: Boolean =
footerFileMetaData.getCreatedBy().startsWith("parquet-mr")
val convertTz =
if (timestampConversion && !isCreatedByParquetMr) {
Some(DateTimeUtils.getTimeZone(sharedConf.get(SQLConf.SESSION_LOCAL_TIMEZONE.key)))
} else {
None
}
```
On our end we don't really need to add this offset since we only have to
deal with int96 written by either Spark or Hive
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]