thinkharderdev commented on pull request #1677:
URL:
https://github.com/apache/arrow-datafusion/pull/1677#issuecomment-1022500419
> Thank you for this proposal @thinkharderdev
>
> While I am not an expert in Ballista and I will defer to others for this
PR, I did have a few comments:
>
> Using a trait that handled serializing an entire logical plan would allow
use cases like using alternate serializers for the plan (like one could imagine
using JSON for example), but I am not sure how it directly helps serializing
extension points.
>
> What would you think about something like the following:
>
> ```rust
> /// Describes something that knows how to serialize / deserialize the
> /// contents of DataFusion user defined extension points
> pub trait ExtensionSerializer {
>
> /// Serializes a [UserDefinedLogicalNode] into an opaque set of
> /// bytes for transport over the network
> fn serialize_extension_node(node: &dyn UserDefinedLogicalNode) ->
Result<Bytes, BallistaError>;
>
> /// Deserializes a set of bytes created by
> /// [serialize_extension_node] into a new instance of a
> /// [UserDefinedLogicalNode]
> fn deserialize_extension_node(bytes: Bytes) -> Result<Arc<dyn
UserDefinedLogicalNode>, BallistaError>;
>
> // TBD Similar functions for the other extension points
> }
>
> struct DefaultExtensionSerializer {
> }
>
> impl ExtensionSerializer for DefaultExtensionSerializer {
> fn serialize_extension_node(node: &dyn UserDefinedLogicalNode) ->
Result<Bytes> {
> Err(BallistaError::NotImplemented(format!("Default serializer does
not know how to serialize {:?}", node)))
> }
>
> fn deserialize_extension_node(_bytes: Bytes) -> Result<Arc<dyn
UserDefinedLogicalNode>> {
> Err(BallistaError::NotImplemented("Default serializer does not
know how to deserialize user defined extensions".to_string()))
> }
> }
> ```
>
> And an instance of `&dyn ExtensionSerializer` could be passed when
serializing plans?
Yeah, this is what I tried to do originally but it didn't work out very
well. The basic problem I was running into is that since we decoding the
message recursively from the leaves, as soon as you "break out" of the specific
types defined in Ballista, you can no longer use any of the decoding machinery.
For example, if I define a logical node like:
```
struct ReverseAllString {}
impl UserDefinedLogicalNode for ReverseAllString {
...
}
```
I have to encode my `ReverseAllString` struct but also need to encode all of
it's inputs. But I can't define the inputs using the protobuf types from
Ballista and still hook into the machinery which converts from my serialized
type to an actual `LogicalPlan`. At least not any way that I'm aware of (short
of hand-rolling the binary codec).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]