[ 
https://issues.apache.org/jira/browse/AVRO-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267690#comment-17267690
 ] 

Oscar Westra van Holthe - Kind commented on AVRO-3026:
------------------------------------------------------

What you propose can already be done:
{code}
@tags(["PI", "Non PI"])
record Employee {
  @tags(["Name"])
  string fullName;
  boolean active = true;
  long salary;
  @tags(["Phone"])
  string phone;
} {code}

> Allow custom annotations in IDL files and support translating them to AVSC 
> Avro.
> --------------------------------------------------------------------------------
>
>                 Key: AVRO-3026
>                 URL: https://issues.apache.org/jira/browse/AVRO-3026
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: spec
>    Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.1
>            Reporter: Feroze Daud
>            Priority: Major
>
> h2. Introduction
> Our company has standardized on Avro schemas for all Data intestion and 
> storage. As part of this, and to satisfy CCPA, we need to be able to tag the 
> records and fields appropriately if they have PI, or Non PI information, etc.
> Avro AVSC files, being valid json, can easily be modified to add tags that 
> will be used by downstream processors, and also wont interfere with Avro 
> itself ( to generate POJO, serialization, deserialization etc)
> One such key we chose is simply called *tags*. It's example usage is shown 
> below.
> {code:java}
> {
>    "type": "record",
>    "name": "PropertyOwner",
>    "namespace": "com.acme.Property", 
>    "tags": ["PI", "PII" ],
>    "fields": [
>    {
>       "name": "FullName",
>       "type": "string",
>       "tags": ["Name"]
>    },
>    {
>        "name": "PhoneNumber",
>        "type": "string",
>        "tags": ["Phone"]
>    }],
> }{code}
>  
> These tags can be processed by downstream processors and the data landing in 
> datalake, or database can be tagged appropriately.
>  
> h2. Problem Description
> While tagging will work fine for AVSC because adding extra fields doesnt make 
> it invalid, we will have a problem when using IDL to author schemas. IDL spec 
> does not allow a way to add extra tags that are copied over to the Avro 
> schema.
>  
> h2. Proposal
> I propose that we allow a special *@annotation* tag . And, this tag can be 
> applied to records and fields. Whatever is in this annotation should be 
> copied verbatim to the output AVSC.
> For eg:
> {code:java}
> @annotation("tags", "[\"PI\", \"Non PI\"]"
> record Employee {
>   @annotation("tags", "[\"Name\"]"
>   string fullName;
>   boolean active = true;
>   long salary;
>   @annotation("tags", "[\"Phone\"]"
>   string phone;
> } {code}
>  
> would generate an avro schema as folllows:
>  
> {code:java}
>  {
>  "type": "record",
>  "name": "Employee",
>  "tags": ["PI", "PII" ],
>  "fields": [
>  {
>  "name": "FullName",
>  "type": "string",
>  "tags": ["Name"]
>  },
>  { 
>   "name": "PhoneNumber", 
>   "type": "string", 
>   "tags": ["Phone"] 
>  }],
> }{code}
>  
> As you can see, we dont need to support any wellformed JSONness in the 
> *@annotation* . It just takes a string and we just render it into the output 
> json.
> @annotation("foo", "[\"bar\"]") -> "tags": ["bar"]
> @annotation("foo", "\{\"bar\": \"jar\"}") -> "tags": {"bar": "jar"}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to