How about we define some common fields and then add a String field to save a 
JSON file, which can be used to save any special fields defined in external 
system. Eagle only needs to take care of the common fields and external system 
can save any data they want in that JSON string field.  

Regards,
Daniel

-----Original Message-----
From: Zhang, Edward (GDI Hadoop) [mailto:yonzh...@ebay.com] 
Sent: Wednesday, January 13, 2016 1:18 PM
To: dev@eagle.incubator.apache.org
Subject: Re: suggestion: add field "threshold" to current "fileSensitivity 
structure" in Eagle

Yes, looks we need a schema abstraction which can represent any sensitivity 
information.
sensitivityType and numOfOccurrences are just two common fields of the whole 
sensitivity information.

For hdfs, the sensitivity information also includes filedir, while for hive, 
the sensitivity information includes hiveResource, which could be database, 
table, column etc.

Thanks
Edward

On 1/13/16, 0:35, "Prasad Mujumdar" <pras...@apache.org> wrote:

> The number of occurrences is certainly a good idea.
> For the HDFS and any future data sources which don't have native 
>schema, how do we handle these fields which are defined in an external system ?
>Are
>you proposing to add a schema abstraction as well ?
>
>thanks
>Prasad
>
>
>On Tue, Jan 12, 2016 at 11:49 PM, Edward Zhang 
><yonzhang2...@apache.org>
>wrote:
>
>> Hi Daniel,
>>
>> That is great idea to add more meaningful fields into sensitivity 
>>metadata,  you can go ahead to design/add that.
>>
>> Only one concern is : how do we name this field generally? and what 
>>else is  possible for future. numOfOccurrences could be a good name, 
>>for hdfs or  hive, the occurrence is defined differently.
>>
>> Thanks
>> Edward
>>
>> On Mon, Jan 11, 2016 at 7:38 PM, Daniel Zhou 
>> <daniel.z...@dataguise.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > Recently I am working on a project to automatically fetch the
>>metadata of
>> > sensitive info stored in DB and then create eagle policy. I am
>>wondering
>> if
>> > we can add a field called "threshold" to current "fileSensitivity 
>> > structure" in eagle so that we can create a policy with more details.
>> >
>> > Our company's product "DgSecure" can discover all the sensitive
>>elements
>> > within every file in hadoop  automatically,  so we have many 
>> > details
>>of
>> > these sensitive information. With these information, we can make 
>> > the
>> policy
>> > more precisely.  For example, I want to create a policy based on 
>> > two parameters, one is  "sensitivity type", the other is called
>>"threshold".
>> > Only when the total number of that particular sensitive type 
>> > element reaches or exceeds "threshold" can the alerts be triggered.
>> >
>> > So the trigger condition could be something like this:
>> > ........ if (sensitiveType == "MailAddress" && NumberOfSensData
>> > >=threshodl) .....
>> >
>> > I think this condition makes more sense than just tagging a file 
>> > with
>>a
>> > sensitive type.
>> >
>> > Please let me know if you have any opinions or suggestions. :)
>> >
>> > Thanks!
>> > Daniel
>> >
>>

Reply via email to