[GitHub] [hudi] s-sanjay commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS
s-sanjay commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-687637620 @FelixKJose did you check my comment ? Hudi is just using spark's [SchemaConvertors](https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L150) and that does not support user setting to use millis or micros. We will have to fix it in spark and then upgrade hudi's spark dependency to use that library. It is also not easy to copy that function into hudi because the code is being called from multiple code path. Fixing presto makes the most sense since even without hudi if someone where to use micros then the query will fail. However, I do agree we need to document this because even with presto fix it may not be possible for everyone to upgrade presto or cherry pick the fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] s-sanjay commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS
s-sanjay commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-679406205 @FelixKJose I have raised a [PR](https://github.com/prestodb/presto/pull/15074) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] s-sanjay commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS
s-sanjay commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-669845428 Right now presto does not support reading TIMESTAMP_MICROS type. This needs to be fixed from the presto side for which I am working on a fix. ( presto only supports timestamp upto millisecond granularity so the fix will simply convert the microsecond to millisecond ) I think `spark.sql.parquet.outputTimestampType` is not working because hudi is using spark's [SchemaConvertors](https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L150) which is not even looking at this option. This might be because that property was to control the parquet type but hudi uses avro format to store the schema of the file within parquet. It would be very difficult to change this from the hudi or spark side. Right now the easiest option is to choose the double type as mentioned above till the fix merges to presto. I will share the PR link here in couple days ( I need to refactor it since the presto version is custom internal version ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org