wiedld opened a new pull request, #12199:
URL: https://github.com/apache/datafusion/pull/12199

   **Not ready for review. Demonstrating CI failure, then pushing next commit 
which fixes the casting.**
   
   ## Which issue does this PR close?
   
   Closes #12118 
   
   ## Rationale for this change
   
   We have two new view data types, 
[Utf8View](https://docs.rs/arrow-schema/52.2.0/arrow_schema/enum.DataType.html#variant.Utf8View)
 and 
[BinaryView](https://docs.rs/arrow-schema/52.2.0/arrow_schema/enum.DataType.html#variant.BinaryView).
 Support in datafusion is part of [this 
epic](https://github.com/apache/datafusion/issues/11752), and this specific PR 
is about adding support for the (de-)serialization of logical and physical 
plans into the substrait format.
   
   This PR adds new substrait variations on existing type classes. For example, 
there is a "string" substrait class which can have different variations 
representing different physical types (e.g. Utf8 vs LargeUtf8 vs Utf8View). If 
we serialize using string variation=2 (e.g. view physical type), then the 
deserialization of variation=2 will give us back the Utf8View. More background 
is [given 
here](https://github.com/apache/datafusion/issues/12118#issuecomment-2311396274).
   
   ## What changes are included in this PR?
   
   * feat(12118): logical plan support for Utf8View (d7be771eb)
   
   * feat(12118): physical plan support for Utf8View (b17ae25a7)
   
   * feat(12118): logical plan support for BinaryView (8ca5fc147)
   
   * feat(12118): physical plan support for BinaryView (700fe4155)
   
   * TODO -- last commit which fixes the cast handling for binary view. See CI 
test failure (demonstrating need) first.
   * 
   
   ## Are these changes tested?
   
   The Utf8View and BinaryView are covered in the logical plan roundtrip 
serialization tests.
   
   However, the physical plan roundtrip serialization tests are not yet 
implement. There is an [ongoing 
epic](https://github.com/apache/datafusion/issues/5173) to finish the physical 
plan serialization. As such, I added the physical plan substrait handling of 
Utf8View and BinaryView (to avoid incurring more tech debt) -- but this code is 
not tested.
   
   ## Are there any user-facing changes?
   
   No API contract change.
   Removal of unimplemented errors if using these new datatypes in subtrait 
serialization.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to