I want to use Avro to validate data in JSON objects against a schema.

My expectation was that the schema validation process covers the
following scenarios with appropriate error messages:

1. Required field X is missing in the data. Error message something
like "field X not found"
2. Field X has the wrong type. Error message something like "field X
expected String, found Integer"
3. Field Y is in the data but it's not mentioned in the schema. Error
message something like "Unexpected field found: Y"

With the code below, I found that only scenario 1 works as I expected.
Scenario 2 gets a somewhat helpful error message and scenario 3 is not
a failure at all.

Is there anything wrong with my approach?

Lukas

// validation method
void validate(ObjectNode node) {
    Schema schema = SchemaBuilder
        .record("test")
        .fields()
        .requiredString("testField")
        .endRecord();

    String nodeAsString = node.toString();
    DatumReader<String> datumReader = new GenericDatumReader<>(schema);
    datumReader.read(null, getDecoder(schema, nodeAsString));
}

// scenarios
JsonNodeFactory factory = JsonNodeFactory.instance;

// 1. Required field missing
ObjectNode node = factory.objectNode()
node.put("xyz", "foo");

validate(node) // Result: "Expected field name not found: testField"

// 2. Required field has wrong type
ObjectNode node = factory.objectNode()
node.put("testField", 1);
validate(node) // Result: "Expected string. Got VALUE_NUMBER_INT" (The
name of the field that has the wrong type is not part of the message
which is less helpful if there are multiple fields)

// 3. Extraneous field
ObjectNode node = factory.objectNode()
node.put("testField", "foo");
node.put("xyz", "foo");
validate(node) // There is no error even though the specified JSON
object contains data that the schema does not define

Reply via email to