[I] [Task]: Python schema generated types uses schema registry and coder registry [beam]

via GitHub Thu, 19 Mar 2026 09:03:00 -0700


Abacn opened a new issue, #37893:
URL: https://github.com/apache/beam/issues/37893


   ### What needs to happen?
   
   Currently a Python Row (encoded with Row Coder) go through 
serialization/deserialization becomes a schema generated types named tuple. 
There are many caveats for this behavior
   
   - original type get lost
   
   - #22714 
   
   with cloudpickle becomes default and schema registry coder registry saved on 
pipeline submission, we should be able to use the schema id registered in the 
schema registry to obtain the user type, then use coder registry for the user 
type to get registered (row) coder, that makes user_type->GBK still produces 
user_type
   
   ### Issue Priority
   
   Priority: 2 (default / most normal work should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Task]: Python schema generated types uses schema registry and coder registry [beam]

Reply via email to