paleolimbot commented on issue #18223:
URL: https://github.com/apache/datafusion/issues/18223#issuecomment-3474014643
I like the pretty print extension!
> That's a very interesting point! Would using the existing LogicalType
instead of Field resolve this problem? There we don't distinguish between, for
example, Utf8 and Utf8View.
It's close! The LogicalType implements a few things (like Utf8/Utf8View as
the same type) that if swapped into the Expr today might cause some disruption.
None of the below should keep you from pursuing this more ambitious/eutopian
future...I want that future but I also want PRs that can merge and getting
anybody to care about/review this stuff is hard (thank you for caring and
reviewing!).
> In my head, the idea is that the logical type reference will provide
access to these extensions.
Agreed! Inlining that reference is great. Probably the immediate challenge
is ensuring the `Column` references are able to resolve their `dyn` type (maybe
you've solved this already)
What is specified in the Expr/LogicalPlan enum vairants today is basically a
Field whose name we ignore:
```rust
struct ExprType {
// on merge mismatch is an error, casts + printing based on this
storage_type: DataType,
// Alias merges metadata keys, Values error on mismatch, casts and
printing ignore
metadata: FieldMetadata,
// Mostly propagated everywhere but it's not quite clear how this should
work with
// a Cast or Alias
nullability: bool
}
```
The `SerializedTypeView` garbage I was playing with was trying to formalize
a few of the ambiguities with respect to how the information is propagated.
Basically I would like to separate `FieldMetadata` from extension_name/metadata
to make it harder to accidentally overwrite or drop the type information.
```rust
struct ExprType {
// merge mismatch, cast, and printing always consider these three pieces
of information
// the 'safe' but slightly too strict way to do this is to always treat
them with exact equality
storage_type: DataType,
extension_name: String,
extension_metadata: String,
// Not sure whether merge or drop on mismatch is better but at least
nothing
// overwrite or drops the extension type
metadata: FieldMetadata,
// Allows Alias and Cast to propagate the parent expression's nullability
(which they
// currently do)
nullability: Option<bool>
}
```
...maybe swapping in `Arc<dyn TypeExtension>` for `extension_name` and
`extension_metadata` in that example would work? Or maybe an `ExprType` is
`DataType` + `Arc<dyn TypeExtension>` and `ExprField` adds the `FieldMetadata`
and nullability? `ScalarAndMetadata` could maybe then store the `ExprType` as
well? Or something better?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]