Re: [PR] [YAML]: add optional schema config for all transforms [beam]

via GitHub Tue, 26 Aug 2025 14:55:16 -0700


derrickaw commented on code in PR #35952:
URL: https://github.com/apache/beam/pull/35952#discussion_r2302226448



##########
sdks/python/apache_beam/yaml/json_utils.py:
##########
@@ -287,8 +287,9 @@ def row_to_json(beam_type: schema_pb2.FieldType) -> 
Callable[[Any], Any]:
         for field in beam_type.row_type.schema.fields
     }
     return lambda row: {
-        name: convert(getattr(row, name))
+        name: converted
         for (name, convert) in converters.items()
+        if (converted := convert(getattr(row, name, None))) is not None

Review Comment:
   Figured it out after Validate_with_schema test failed after the revert :) :
   So there is a bug in that transform for Null fields.  The validator treats 
it as a failed row if the schema has one thing and the field is None.  So we 
filter out those fields if they are None and let the Validator validate on that 
row.  For example:
   
   BeamSchema_....(name='Bob', score=None, age=25)
   During the conversion process to json -> {'name': 'Bob', 'score': None, 
'age': 25}
   Validation will fail on this row with this schema: 
   {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': 
{'type': 'integer'}, 'score': {'type': 'number'}}}
   
   But if we convert that BeamRow to -> {'name': 'Bob', 'age': 25}
   Then it passes fine.
   
   My understanding of the code base is that we would have to update the 
jsonschema package to allow None, but that seems like a non-starter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [YAML]: add optional schema config for all transforms [beam]

Reply via email to