[jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput

Doug Cutting (JIRA) Thu, 18 Jun 2009 10:34:32 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721364#action_12721364
 ]


Doug Cutting commented on AVRO-29:
----------------------------------

> The parsing table can be considered as a binary version of schema.

Would this be better than simply defining a schema for schemas, something like 
the following?

{code}
{"type" : "record", "name" : "Schema"
 "fields" : [
  {"name": "type", "type": ["Record", "Map", "Array", "Union", "String", 
"Bytes", "Long", ...]}
 ]
}

{"type" : "record", "name" : "Record"
 "fields" : [
  {"name": "name", "type": "string"},
  {"name": "fields", "type": {"type" : "array", "items": "Field"}}
 ]
}

{"type" : "record", "name" : "Field"
 "fields" : [
    {"name": "name", "type": "string"},
    {"name": "schema", "type": "Schema"}
}

{"type" : "record", "name" : "Array"
 "fields" : [{"name": "items", "type": "Schema"}]
}

{"type" : "record", "name" : "Map"
 "fields" : [{"name": "values", "type": "Schema"}]
}

{"type" : "record", "name" : "Union"
 "fields" : [{"name": "types", "type": {"type" : "array", "items": "Schema"}}]
}

{"type" : "record", "name" : "String" "fields" : [] }
{"type" : "record", "name" : "Bytes" "fields" : [] }
{"type" : "record", "name" : "Long" "fields" : [] }
{"type" : "record", "name" : "Double" "fields" : []}
{code}

Such a schema could be included in Avro, and any schema could be efficiently 
serialized in binary with it.  Would a parsing table be substantially more 
efficient?

> Validation and resolution for ValueInput/ValueOutput
> ----------------------------------------------------
>
>                 Key: AVRO-29
>                 URL: https://issues.apache.org/jira/browse/AVRO-29
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-29.patch, AVRO-29.patch
>
>
> This is a companion to AVRO-25, which introduced the classes ValueOutput and 
> ValueInput.  This patch adds two capabilities: validation of 
> ValueInput/Output calls against a schema, and schema-resolution implemented 
> in the context of ValueInput.
> ValidatingValueInput and ValidatingValueOutput take a schema and will 
> validate calls against a schema.  For example, if the schema calls for a 
> record consisting of two longs and a double, then ValidatingOutput will allow 
> the call-sequence readLong, readLong, readDouble and throw an error otherwise.
> ResolvingValueInput takes two schemas, the writer's and the reader's schema, 
> and automatically performs Avro's schema-resolution logic on behalf of the 
> reader.  For example, if the writer's schema calls for a long, and the 
> readers calls for a double, then the reader can call readDouble, and 
> ResolvingValueInput will automatically decode the long sent by the writer and 
> convert it into the double expected by the reader.
> ResolvingValueInput is an alternative to Avro's current GenericDatumReader, 
> which also implements Avro's resolution logic.  In many use-cases, the 
> programmer has their own data structures into which they want to store data 
> read from an Avro stream, data structures that cannot easily be put into the 
> GenericRecord/Array class hierarchy.  With ResolvingValueInput, programmers 
> get the benefit of this resolution logic without being forced into the 
> GenericRecord/Array class hierarchy.
> We recommend that ResolvingValueInput become the standard implementation of 
> the resolution logic, and that GenericDatumReader be implemented in terms of 
> ResolvingValueInput.  However, we haven't implemented this change pending 
> feedback from others.
> We haven't implemented default values, but can add that feature.
> Implementation note: this patch is implemented by translating Avro schemas to 
> LL(1) parsing tables.  This translation is straight forward, but tedious.  If 
> you want to understand how the code works, we recommend that you look in the 
> file "parsing.html" (included in the patch), which explains the translation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-29) Validation and resolution for ValueInput/ValueOutput

Reply via email to