[ 
https://issues.apache.org/jira/browse/AVRO-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339950#comment-17339950
 ] 

Feroze Daud commented on AVRO-3026:
-----------------------------------

Hi!

Finally had a chance to try it out. So the `@tags` annotation works. But there 
doesnt seem to be a way to add a `doc` tag as allowed by AVSC schema to the 
output file.

How do we achieve that?

For eg, this causes an error...
{noformat}
    record employee {
        string @doc("employee name") name;
        boolean active;
        long salary;
    }
 {noformat}
 

Output:

 
{noformat}
$ java -jar ~/DevTools/avro-tools-1.10.1.jar idl employee.avdl  | jq .$ java 
-jar ~/DevTools/avro-tools-1.10.1.jar idl employee.avdl  | jq .Exception in 
thread "main" org.apache.avro.AvroRuntimeException: Can't set reserved 
property: doc at 
org.apache.avro.JsonProperties.addProp(JsonProperties.java:281) at 
org.apache.avro.JsonProperties.access$000(JsonProperties.java:121) at 
org.apache.avro.JsonProperties$1.addProp(JsonProperties.java:127) at 
org.apache.avro.util.internal.Accessor.addProp(Accessor.java:101) at 
org.apache.avro.compiler.idl.Idl.VariableDeclarator(Idl.java:664) at 
org.apache.avro.compiler.idl.Idl.FieldDeclaration(Idl.java:606) at 
org.apache.avro.compiler.idl.Idl.RecordDeclaration(Idl.java:569) at 
org.apache.avro.compiler.idl.Idl.NamedSchemaDeclaration(Idl.java:153) at 
org.apache.avro.compiler.idl.Idl.ProtocolBody(Idl.java:402) at 
org.apache.avro.compiler.idl.Idl.ProtocolDeclaration(Idl.java:227) at 
org.apache.avro.compiler.idl.Idl.CompilationUnit(Idl.java:117) at 
org.apache.avro.tool.IdlTool.run(IdlTool.java:61) at 
org.apache.avro.tool.Main.run(Main.java:67) at 
org.apache.avro.tool.Main.main(Main.java:56) {noformat}

> Allow custom annotations in IDL files and support translating them to AVSC 
> Avro.
> --------------------------------------------------------------------------------
>
>                 Key: AVRO-3026
>                 URL: https://issues.apache.org/jira/browse/AVRO-3026
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: spec
>    Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.1
>            Reporter: Feroze Daud
>            Priority: Major
>
> h2. Introduction
> Our company has standardized on Avro schemas for all Data intestion and 
> storage. As part of this, and to satisfy CCPA, we need to be able to tag the 
> records and fields appropriately if they have PI, or Non PI information, etc.
> Avro AVSC files, being valid json, can easily be modified to add tags that 
> will be used by downstream processors, and also wont interfere with Avro 
> itself ( to generate POJO, serialization, deserialization etc)
> One such key we chose is simply called *tags*. It's example usage is shown 
> below.
> {code:java}
> {
>    "type": "record",
>    "name": "PropertyOwner",
>    "namespace": "com.acme.Property", 
>    "tags": ["PI", "PII" ],
>    "fields": [
>    {
>       "name": "FullName",
>       "type": "string",
>       "tags": ["Name"]
>    },
>    {
>        "name": "PhoneNumber",
>        "type": "string",
>        "tags": ["Phone"]
>    }],
> }{code}
>  
> These tags can be processed by downstream processors and the data landing in 
> datalake, or database can be tagged appropriately.
>  
> h2. Problem Description
> While tagging will work fine for AVSC because adding extra fields doesnt make 
> it invalid, we will have a problem when using IDL to author schemas. IDL spec 
> does not allow a way to add extra tags that are copied over to the Avro 
> schema.
>  
> h2. Proposal
> I propose that we allow a special *@annotation* tag . And, this tag can be 
> applied to records and fields. Whatever is in this annotation should be 
> copied verbatim to the output AVSC.
> For eg:
> {code:java}
> @annotation("tags", "[\"PI\", \"Non PI\"]"
> record Employee {
>   @annotation("tags", "[\"Name\"]"
>   string fullName;
>   boolean active = true;
>   long salary;
>   @annotation("tags", "[\"Phone\"]"
>   string phone;
> } {code}
>  
> would generate an avro schema as folllows:
>  
> {code:java}
>  {
>  "type": "record",
>  "name": "Employee",
>  "tags": ["PI", "PII" ],
>  "fields": [
>  {
>  "name": "FullName",
>  "type": "string",
>  "tags": ["Name"]
>  },
>  { 
>   "name": "PhoneNumber", 
>   "type": "string", 
>   "tags": ["Phone"] 
>  }],
> }{code}
>  
> As you can see, we dont need to support any wellformed JSONness in the 
> *@annotation* . It just takes a string and we just render it into the output 
> json.
> @annotation("foo", "[\"bar\"]") -> "tags": ["bar"]
> @annotation("foo", "\{\"bar\": \"jar\"}") -> "tags": {"bar": "jar"}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to