[
https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721364#action_12721364
]
Doug Cutting commented on AVRO-29:
----------------------------------
> The parsing table can be considered as a binary version of schema.
Would this be better than simply defining a schema for schemas, something like
the following?
{code}
{"type" : "record", "name" : "Schema"
"fields" : [
{"name": "type", "type": ["Record", "Map", "Array", "Union", "String",
"Bytes", "Long", ...]}
]
}
{"type" : "record", "name" : "Record"
"fields" : [
{"name": "name", "type": "string"},
{"name": "fields", "type": {"type" : "array", "items": "Field"}}
]
}
{"type" : "record", "name" : "Field"
"fields" : [
{"name": "name", "type": "string"},
{"name": "schema", "type": "Schema"}
}
{"type" : "record", "name" : "Array"
"fields" : [{"name": "items", "type": "Schema"}]
}
{"type" : "record", "name" : "Map"
"fields" : [{"name": "values", "type": "Schema"}]
}
{"type" : "record", "name" : "Union"
"fields" : [{"name": "types", "type": {"type" : "array", "items": "Schema"}}]
}
{"type" : "record", "name" : "String" "fields" : [] }
{"type" : "record", "name" : "Bytes" "fields" : [] }
{"type" : "record", "name" : "Long" "fields" : [] }
{"type" : "record", "name" : "Double" "fields" : []}
{code}
Such a schema could be included in Avro, and any schema could be efficiently
serialized in binary with it. Would a parsing table be substantially more
efficient?
> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>
> Key: AVRO-29
> URL: https://issues.apache.org/jira/browse/AVRO-29
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Raymie Stata
> Assignee: Thiruvalluvan M. G.
> Attachments: AVRO-29.patch, AVRO-29.patch
>
>
> This is a companion to AVRO-25, which introduced the classes ValueOutput and
> ValueInput. This patch adds two capabilities: validation of
> ValueInput/Output calls against a schema, and schema-resolution implemented
> in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will
> validate calls against a schema. For example, if the schema calls for a
> record consisting of two longs and a double, then ValidatingOutput will allow
> the call-sequence readLong, readLong, readDouble and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema,
> and automatically performs Avro's schema-resolution logic on behalf of the
> reader. For example, if the writer's schema calls for a long, and the
> readers calls for a double, then the reader can call readDouble, and
> ResolvingValueInput will automatically decode the long sent by the writer and
> convert it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader,
> which also implements Avro's resolution logic. In many use-cases, the
> programmer has their own data structures into which they want to store data
> read from an Avro stream, data structures that cannot easily be put into the
> GenericRecord/Array class hierarchy. With ResolvingValueInput, programmers
> get the benefit of this resolution logic without being forced into the
> GenericRecord/Array class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of
> the resolution logic, and that GenericDatumReader be implemented in terms of
> ResolvingValueInput. However, we haven't implemented this change pending
> feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to
> LL(1) parsing tables. This translation is straight forward, but tedious. If
> you want to understand how the code works, we recommend that you look in the
> file "parsing.html" (included in the patch), which explains the translation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.