[ https://issues.apache.org/jira/browse/ARROW-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269576#comment-17269576 ]
Max Burke commented on ARROW-11324: ----------------------------------- Yup. I've attached a test file. For what it's worth, this is the change that we've applied locally to work around it: [https://github.com/urbanlogiq/arrow/commit/9be88cf2994fe55ae0d2f5ae137b9e73daac1ef0] [^0100c909-2537-c4dc-ce1d-1b7a75d613e8.parquet] > [Rust] Querying datetime data in DataFusion with an embedded timezone always > fails > ---------------------------------------------------------------------------------- > > Key: ARROW-11324 > URL: https://issues.apache.org/jira/browse/ARROW-11324 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion > Reporter: Max Burke > Priority: Blocker > Attachments: 0100c909-2537-c4dc-ce1d-1b7a75d613e8.parquet > > > We have a number (~ hundreds of thousands) of Parquet files that have > embedded Arrow schemas in them that have time-valued columns with the type > DateTime(TimeUnit::Nanosecond, Some("UTC")). > > One of the changes in the Arrow 2 -> 3 working window was to make the Parquet > loader prefer the Arrow schema compared to the one generated from the > columns. > > But because DataFusion has the timezone field of the DateTime variant > hardcoded as None, we can't load any of our data after this upgrade; we get > errors like: > {{SELECT * FROM parquet_table WHERE ("timestamp" >= > to_timestamp('2010-03-24T13:00:00.000000Z') AND "timestamp" <= > to_timestamp('2010-03-25T00:00:00.000000Z')) ORDER BY timestamp ASC NULLS > LAST;}} > {{Plan("\'Timestamp(Nanosecond, Some(\"UTC\")) >= Timestamp(Nanosecond, > None)\' can\'t be evaluated because there isn\'t a common type to coerce the > types to")}} > > Any ideas/thoughts? -- This message was sent by Atlassian Jira (v8.3.4#803005)