wiedld opened a new pull request, #12199: URL: https://github.com/apache/datafusion/pull/12199
**Not ready for review. Demonstrating CI failure, then pushing next commit which fixes the casting.** ## Which issue does this PR close? Closes #12118 ## Rationale for this change We have two new view data types, [Utf8View](https://docs.rs/arrow-schema/52.2.0/arrow_schema/enum.DataType.html#variant.Utf8View) and [BinaryView](https://docs.rs/arrow-schema/52.2.0/arrow_schema/enum.DataType.html#variant.BinaryView). Support in datafusion is part of [this epic](https://github.com/apache/datafusion/issues/11752), and this specific PR is about adding support for the (de-)serialization of logical and physical plans into the substrait format. This PR adds new substrait variations on existing type classes. For example, there is a "string" substrait class which can have different variations representing different physical types (e.g. Utf8 vs LargeUtf8 vs Utf8View). If we serialize using string variation=2 (e.g. view physical type), then the deserialization of variation=2 will give us back the Utf8View. More background is [given here](https://github.com/apache/datafusion/issues/12118#issuecomment-2311396274). ## What changes are included in this PR? * feat(12118): logical plan support for Utf8View (d7be771eb) * feat(12118): physical plan support for Utf8View (b17ae25a7) * feat(12118): logical plan support for BinaryView (8ca5fc147) * feat(12118): physical plan support for BinaryView (700fe4155) * TODO -- last commit which fixes the cast handling for binary view. See CI test failure (demonstrating need) first. * ## Are these changes tested? The Utf8View and BinaryView are covered in the logical plan roundtrip serialization tests. However, the physical plan roundtrip serialization tests are not yet implement. There is an [ongoing epic](https://github.com/apache/datafusion/issues/5173) to finish the physical plan serialization. As such, I added the physical plan substrait handling of Utf8View and BinaryView (to avoid incurring more tech debt) -- but this code is not tested. ## Are there any user-facing changes? No API contract change. Removal of unimplemented errors if using these new datatypes in subtrait serialization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
