[ https://issues.apache.org/jira/browse/HIVE-26612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stamatis Zampetakis updated HIVE-26612: --------------------------------------- Description: If a parquet file has a Type of "int64 eventtime (TIMESTAMP(MILLIS,true))", the following error is produced: {noformat} exec.Task: Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:624) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:531) at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:197) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:98) {noformat} The parquet file can be created with the following steps (through spark): spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS") spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY") [1] val df = Seq( (1, Timestamp.valueOf("2014-01-01 23:00:01")), (1, Timestamp.valueOf("2014-11-30 12:40:32")), (2, Timestamp.valueOf("2016-12-29 09:54:00")), (2, Timestamp.valueOf("2016-05-09 10:12:43")) ).toDF("typeid","eventtime") [2] [root@c4839-node3 test_parquet2]# parquet-tools schema part-00001-6c90b794-90b9-4cc0-afc5-2e49a4e96bad-c000.snappy.parquet message spark_schema { required int32 typeid; optional int64 eventtime (TIMESTAMP(MILLIS,true)); } [3] [root@c4839-node3 test_parquet1]# parquet-tools schema part-00001-cb1aeebb-ec87-4273-82ec-911c4fb605b6-c000.snappy.parquet message spark_schema { required int32 typeid; optional int96 eventtime; } was: If a parquet file has a Type of "int64 eventtime (TIMESTAMP(MILLIS,true))", the following error is produced: exec.Task: Failed with exception java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:624) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:531) at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:197) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:98) The parquet file can be created with the following steps (through spark): spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS") spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY") spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY") [1] val df = Seq( (1, Timestamp.valueOf("2014-01-01 23:00:01")), (1, Timestamp.valueOf("2014-11-30 12:40:32")), (2, Timestamp.valueOf("2016-12-29 09:54:00")), (2, Timestamp.valueOf("2016-05-09 10:12:43")) ).toDF("typeid","eventtime") [2] [root@c4839-node3 test_parquet2]# parquet-tools schema part-00001-6c90b794-90b9-4cc0-afc5-2e49a4e96bad-c000.snappy.parquet message spark_schema { required int32 typeid; optional int64 eventtime (TIMESTAMP(MILLIS,true)); } [3] [root@c4839-node3 test_parquet1]# parquet-tools schema part-00001-cb1aeebb-ec87-4273-82ec-911c4fb605b6-c000.snappy.parquet message spark_schema { required int32 typeid; optional int96 eventtime; } > Hive cannot read parquet files with int64 (TIMESTAMP_MILLIS) > ------------------------------------------------------------ > > Key: HIVE-26612 > URL: https://issues.apache.org/jira/browse/HIVE-26612 > Project: Hive > Issue Type: Bug > Components: Database/Schema > Reporter: Steve Carlin > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > If a parquet file has a Type of "int64 eventtime (TIMESTAMP(MILLIS,true))", > the following error is produced: > {noformat} > exec.Task: Failed with exception > java.io.IOException:org.apache.parquet.io.ParquetDecodingException: Can not > read value at 1 in block 0 in file > file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet > java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not > read value at 1 in block 0 in file > file:/home/steve/upstream/hive/itests/qtest/target/tmp/parquet_format_ts_as_bigint/part-00000/timestamp_as_bigint.parquet > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:624) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:531) > at > org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:197) > at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:98) > {noformat} > The parquet file can be created with the following steps (through spark): > spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS") > spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY") > spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY") > spark.conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY") > spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY") > [1] > val df = Seq( > (1, Timestamp.valueOf("2014-01-01 23:00:01")), > (1, Timestamp.valueOf("2014-11-30 12:40:32")), > (2, Timestamp.valueOf("2016-12-29 09:54:00")), > (2, Timestamp.valueOf("2016-05-09 10:12:43")) > ).toDF("typeid","eventtime") > [2] > [root@c4839-node3 test_parquet2]# parquet-tools schema > part-00001-6c90b794-90b9-4cc0-afc5-2e49a4e96bad-c000.snappy.parquet > message spark_schema { > required int32 typeid; > optional int64 eventtime (TIMESTAMP(MILLIS,true)); > } > [3] > [root@c4839-node3 test_parquet1]# parquet-tools schema > part-00001-cb1aeebb-ec87-4273-82ec-911c4fb605b6-c000.snappy.parquet > message spark_schema { > required int32 typeid; > optional int96 eventtime; > } -- This message was sent by Atlassian Jira (v8.20.10#820010)