How about we define some common fields and then add a String field to save a JSON file, which can be used to save any special fields defined in external system. Eagle only needs to take care of the common fields and external system can save any data they want in that JSON string field.
Regards, Daniel -----Original Message----- From: Zhang, Edward (GDI Hadoop) [mailto:yonzh...@ebay.com] Sent: Wednesday, January 13, 2016 1:18 PM To: dev@eagle.incubator.apache.org Subject: Re: suggestion: add field "threshold" to current "fileSensitivity structure" in Eagle Yes, looks we need a schema abstraction which can represent any sensitivity information. sensitivityType and numOfOccurrences are just two common fields of the whole sensitivity information. For hdfs, the sensitivity information also includes filedir, while for hive, the sensitivity information includes hiveResource, which could be database, table, column etc. Thanks Edward On 1/13/16, 0:35, "Prasad Mujumdar" <pras...@apache.org> wrote: > The number of occurrences is certainly a good idea. > For the HDFS and any future data sources which don't have native >schema, how do we handle these fields which are defined in an external system ? >Are >you proposing to add a schema abstraction as well ? > >thanks >Prasad > > >On Tue, Jan 12, 2016 at 11:49 PM, Edward Zhang ><yonzhang2...@apache.org> >wrote: > >> Hi Daniel, >> >> That is great idea to add more meaningful fields into sensitivity >>metadata, you can go ahead to design/add that. >> >> Only one concern is : how do we name this field generally? and what >>else is possible for future. numOfOccurrences could be a good name, >>for hdfs or hive, the occurrence is defined differently. >> >> Thanks >> Edward >> >> On Mon, Jan 11, 2016 at 7:38 PM, Daniel Zhou >> <daniel.z...@dataguise.com> >> wrote: >> >> > Hi all, >> > >> > Recently I am working on a project to automatically fetch the >>metadata of >> > sensitive info stored in DB and then create eagle policy. I am >>wondering >> if >> > we can add a field called "threshold" to current "fileSensitivity >> > structure" in eagle so that we can create a policy with more details. >> > >> > Our company's product "DgSecure" can discover all the sensitive >>elements >> > within every file in hadoop automatically, so we have many >> > details >>of >> > these sensitive information. With these information, we can make >> > the >> policy >> > more precisely. For example, I want to create a policy based on >> > two parameters, one is "sensitivity type", the other is called >>"threshold". >> > Only when the total number of that particular sensitive type >> > element reaches or exceeds "threshold" can the alerts be triggered. >> > >> > So the trigger condition could be something like this: >> > ........ if (sensitiveType == "MailAddress" && NumberOfSensData >> > >=threshodl) ..... >> > >> > I think this condition makes more sense than just tagging a file >> > with >>a >> > sensitive type. >> > >> > Please let me know if you have any opinions or suggestions. :) >> > >> > Thanks! >> > Daniel >> > >>