[ https://issues.apache.org/jira/browse/BEAM-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122523#comment-17122523 ]
Beam JIRA Bot commented on BEAM-8732: ------------------------------------- This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3. Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean. > Add support for additional structured types to Schemas/RowCoders > ---------------------------------------------------------------- > > Key: BEAM-8732 > URL: https://issues.apache.org/jira/browse/BEAM-8732 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Chad Dombrova > Priority: P2 > Labels: stale-P2 > > Currently we can convert between a {{NamedTuple}} type and its {{Schema}} > protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd > like to introduce a system to support additional types, starting with > structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}. > I've only just started digesting the code, but this task seems pretty > straightforward. For example, I think the type-to-schema code would look > roughly like this: > {code:python} > def typing_to_runner_api(type_): > # type: (Type) -> schema_pb2.FieldType > structured_handler = _get_structured_handler(type_) > if structured_handler: > schema = None > if hasattr(type_, 'id'): > schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id) > if schema is None: > fields = structured_handler.get_fields() > type_id = str(uuid4()) > schema = schema_pb2.Schema(fields=fields, id=type_id) > SCHEMA_REGISTRY.add(type_, schema) > return schema_pb2.FieldType( > row_type=schema_pb2.RowType( > schema=schema)) > {code} > The rest of the work would be in implementing a class hierarchy for working > with structured types, such as getting a list of fields from an instance, and > instantiation from a list of fields. Eventually we can extend this behavior > to arbitrary, unstructured types. > Going in the schema-to-type direction, we have the problem of choosing which > type to use for a given schema. I believe that as long as > {{typing_to_runner_api()}} has been called on our structured type in the > current python session, it should be added to the registry and thus round > trip ok, so I think we just need a public function for registering schemas > for structured types. > [~bhulette] Did you want to tackle this or are you ok with me going after it? > -- This message was sent by Atlassian Jira (v8.3.4#803005)