derrickaw commented on code in PR #35952: URL: https://github.com/apache/beam/pull/35952#discussion_r2302226448
########## sdks/python/apache_beam/yaml/json_utils.py: ########## @@ -287,8 +287,9 @@ def row_to_json(beam_type: schema_pb2.FieldType) -> Callable[[Any], Any]: for field in beam_type.row_type.schema.fields } return lambda row: { - name: convert(getattr(row, name)) + name: converted for (name, convert) in converters.items() + if (converted := convert(getattr(row, name, None))) is not None Review Comment: Figured it out after Validate_with_schema test failed after the revert :) : So there is a bug in that transform for Null fields. The validator treats it as a failed row if the schema has one thing and the field is None. So we filter out those fields if they are None and let the Validator validate on that row. For example: BeamSchema_....(name='Bob', score=None, age=25) During the conversion process to json -> {'name': 'Bob', 'score': None, 'age': 25} Validation will fail on this row with this schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'score': {'type': 'number'}}} But if we convert that BeamRow to -> {'name': 'Bob', 'age': 25} Then it passes fine. My understanding of the code base is that we would have to update the jsonschema package to allow None, but that seems like a non-starter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org