derrickaw commented on code in PR #35952:
URL: https://github.com/apache/beam/pull/35952#discussion_r2302226448
##########
sdks/python/apache_beam/yaml/json_utils.py:
##########
@@ -287,8 +287,9 @@ def row_to_json(beam_type: schema_pb2.FieldType) ->
Callable[[Any], Any]:
for field in beam_type.row_type.schema.fields
}
return lambda row: {
- name: convert(getattr(row, name))
+ name: converted
for (name, convert) in converters.items()
+ if (converted := convert(getattr(row, name, None))) is not None
Review Comment:
Figured it out after Validate_with_schema test failed after the revert :) :
So there is a bug in that transform for Null fields. The validator treats
it as a failed row if the schema has one thing and the field is None. So we
filter out those fields if they are None and let the Validator validate on that
row. For example:
BeamSchema_....(name='Bob', score=None, age=25)
During the conversion process to json -> {'name': 'Bob', 'score': None,
'age': 25}
Validation will fail on this row with this schema:
{'type': 'object', 'properties': {'name': {'type': 'string'}, 'age':
{'type': 'integer'}, 'score': {'type': 'number'}}}
But if we convert that BeamRow to -> {'name': 'Bob', 'age': 25}
Then it passes fine.
My understanding of the code base is that we would have to update the
jsonschema package to allow None, but that seems like a non-starter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]