lostluck commented on issue #23278:
URL: https://github.com/apache/beam/issues/23278#issuecomment-1364758660
Thank you for your interest and patience!
Specifically this issue is about improving the validation that the SDK
provides. In particular, it's not possible for Beam to encode arbitrary
`interface{}` or `any` types as part of a PCollection. PCollections are
required to have a static type at Pipeline Construction time. This avoids
runtime type errors at pipeline execution time, and allows the execution layer
to optimize how it's decoding types.
So the error here is that the SDK isn't validating the types in emitter
functions (like `func(T)` or `func(K, V)`) iterators function (like `func(*T)
bool` or `func(*K, *V) bool, or `func(K) func(*V) bool`, that the `T` or `K` or
`V` in those are a known, registered type that Beam knows how to encode and
decode.
In short, the goal is to make the type signatures for the main DoFn method
ProcessElement *fail* pipeline construction when those are plain `interface{}`.
This would involve updating the `funcx` and `typex` packages to perform this
validation on the emitter and emitter types and failing them accordingly.
The trick however is that Universal types, like `beam.T` are `any`. They
*are* allowed, but only if during pipeline construction they are inferable to a
concrete type, or they are bound concretely to a specific type.
If you have specific questions, let me know. "details instructions" is
tantamount to simply doing the work to me, so smaller, specific questions will
get better responses.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]