problem with this approach is that if our data structure is nested in a
json, it is hard to be used for policy creation. Today looks Siddhi CEP
engine only supports flat structure unless we customize some comparison
function for this json field.

Thanks
Edward

On 1/13/16, 14:13, "Daniel Zhou" <daniel.z...@dataguise.com> wrote:

>How about we define some common fields and then add a String field to
>save a JSON file, which can be used to save any special fields defined in
>external system. Eagle only needs to take care of the common fields and
>external system can save any data they want in that JSON string field.
>
>Regards,
>Daniel
>
>-----Original Message-----
>From: Zhang, Edward (GDI Hadoop) [mailto:yonzh...@ebay.com]
>Sent: Wednesday, January 13, 2016 1:18 PM
>To: dev@eagle.incubator.apache.org
>Subject: Re: suggestion: add field "threshold" to current
>"fileSensitivity structure" in Eagle
>
>Yes, looks we need a schema abstraction which can represent any
>sensitivity information.
>sensitivityType and numOfOccurrences are just two common fields of the
>whole sensitivity information.
>
>For hdfs, the sensitivity information also includes filedir, while for
>hive, the sensitivity information includes hiveResource, which could be
>database, table, column etc.
>
>Thanks
>Edward
>
>On 1/13/16, 0:35, "Prasad Mujumdar" <pras...@apache.org> wrote:
>
>> The number of occurrences is certainly a good idea.
>> For the HDFS and any future data sources which don't have native
>>schema, how do we handle these fields which are defined in an external
>>system ?
>>Are
>>you proposing to add a schema abstraction as well ?
>>
>>thanks
>>Prasad
>>
>>
>>On Tue, Jan 12, 2016 at 11:49 PM, Edward Zhang
>><yonzhang2...@apache.org>
>>wrote:
>>
>>> Hi Daniel,
>>>
>>> That is great idea to add more meaningful fields into sensitivity
>>>metadata,  you can go ahead to design/add that.
>>>
>>> Only one concern is : how do we name this field generally? and what
>>>else is  possible for future. numOfOccurrences could be a good name,
>>>for hdfs or  hive, the occurrence is defined differently.
>>>
>>> Thanks
>>> Edward
>>>
>>> On Mon, Jan 11, 2016 at 7:38 PM, Daniel Zhou
>>> <daniel.z...@dataguise.com>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > Recently I am working on a project to automatically fetch the
>>>metadata of
>>> > sensitive info stored in DB and then create eagle policy. I am
>>>wondering
>>> if
>>> > we can add a field called "threshold" to current "fileSensitivity
>>> > structure" in eagle so that we can create a policy with more details.
>>> >
>>> > Our company's product "DgSecure" can discover all the sensitive
>>>elements
>>> > within every file in hadoop  automatically,  so we have many
>>> > details
>>>of
>>> > these sensitive information. With these information, we can make
>>> > the
>>> policy
>>> > more precisely.  For example, I want to create a policy based on
>>> > two parameters, one is  "sensitivity type", the other is called
>>>"threshold".
>>> > Only when the total number of that particular sensitive type
>>> > element reaches or exceeds "threshold" can the alerts be triggered.
>>> >
>>> > So the trigger condition could be something like this:
>>> > ........ if (sensitiveType == "MailAddress" && NumberOfSensData
>>> > >=threshodl) .....
>>> >
>>> > I think this condition makes more sense than just tagging a file
>>> > with
>>>a
>>> > sensitive type.
>>> >
>>> > Please let me know if you have any opinions or suggestions. :)
>>> >
>>> > Thanks!
>>> > Daniel
>>> >
>>>
>

Reply via email to