On May 3, 2010, at 10:03 AM, Doug Cutting wrote:
> Scott Carey wrote:
>> There has been talk that AvroGen would handle features like this (as well as
>> many others) in time. However this is one that should probably be addressed
>> at the JSON level regardless of the future direction of AvroGen.
>
> Note that JSON schemas and protocols need to be standalone, containing
> the full lexical closure of schemas referenced, when they are included
> in data files and exchanged in RPC handshakes without reference to
> external data. Thus I am reluctant to add a JSON syntax for file
> inclusion. Rather, I think a pre-processor is appropriate. The
> pre-processor would not be run on schemas included in files or exchanged
> in RPC handshakes, but would be run for schemas read from files.
Exactly. I don't think we shouldn't change the JSON syntax by adding
references or includes.
We should just make the SpecificCompiler capable of reading a collection of
files and figuring out how to compile them when there is not full lexical
closure in a .avsc file.
File formats and RPC's have much stricter requirements than the
SpecificCompiler.
>
> I have experimented with using the m4 pre-processor for this purpose,
> and found it a bit awkward. Perhaps someone can develop macros for m4
> that make it palatable, or perhaps we can develop a custom pre-processor
> for JSON.
>
> We might exploit otherwise-illegal JSON syntax, like backquotes, for
> pre-processor directives. An include might look something like:
>
> {"protocol": "org.foo.BarProtocol",
> "types": [
> `include org.foo.Bar`,
> ...
> ]
> }
>
Rather than use a preprocessor, Is it possible to have the SpecificCompiler
search the other files in the set for types that can't be found in the current
file? The result will be SpecificRecord objects that have their $SCHEMA field
populated with a schema that has full lexical closure.
Essentially, if given two files:
IpTypes.avsc --
[{"name": "com.somewhere.avro.IPV4", "type": "fixed", "size":4},
{"name": "com.somewhere.avro.IPV6", "type": "fixed", "size":16}]
MyRecord.avsc --
{"name": "com.somewhere.avro.MyRecord", "type": "record", "fields": [
{"name": "hostname", "type": "string"},
{"name": "IP", "type": [ "IPV4", "IPV6" ]}
]}
The SpecificCompiler could compile MyRecord.avsc if concurrently given
IpTypes.avsc to resolve the "IPV4" and "IPV6" unknown references. Perhaps it
could also compile if it is aware of a SpecificRecord Java class that has an
appropriate schema. A preprocessor would be tricky to do this especially in a
namespace-appropriate way, and would not be able to support integration with
already made SpecificRecord classes.
Perhaps IPV4 and IPV6 are already compiled SpecificRecord classes in jar
"CommonTypes.jar" -- SpecificCompiler could run with those in its classpath and
a directive to look for valid types in its classpath in addition to the files.
The MyRecord.avsc file above does not contain a fully valid Avro schema, so
perhaps we could denote this with a different file extension.
> Also note that a protocol file (.avpr) need not actually define any
> messages but can be used to define a set of types that reference one
> another. This is a stopgap, but a useful one.
>
> Doug