Sivaprasanna,

Thanks for joining this effort. I don’t recall what’s on the existing Jira, but 
please be very aware of the challenges in data anonymization and the various 
threat models — de-anonymizing data can lead to the leak of PII, EPHI, PCI 
data, etc. In some cases, it can even lead to physical danger against persons. 

There are a number of high impact examples of avoidable scenarios like this. 

https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/

https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/

We should use publicly reviewed algorithms, document the risks and known 
challenges well, take into consideration provenance and other NiFi-specific 
features, and write a good summary of these features if/when they are 
introduced. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 20, 2018, at 10:06, Sivaprasanna <sivaprasanna...@gmail.com> wrote:
> 
> Wow.. I dint realize there was a JIRA already. I'm interested and would be
> happy to contribute my time & efforts on this.
> 
>> On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess <mattyb...@apache.org> wrote:
>> 
>> I think is a great idea, I filed a Jira [1] a while ago in case
>> someone wanted to start working on it (or in case I got a chance). It
>> mentions ARX but any Apache-friendly implementation is of course
>> welcome. I think it should be in its own bundle as it is functionality
>> separate from all our other bundles (and not ubiquitous enough to put
>> in the standard NAR).
>> 
>> Glad to hear you're interested in this, please feel free to reach out
>> with any questions and I too would be happy to review any
>> contributions.
>> 
>> Thanks,
>> Matt
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-4492
>> 
>> On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen <mikerthom...@gmail.com>
>> wrote:
>>> 
>>> There's a framework called ARX that could very useful for this. The only
>>> question you have is how compliant it would be with different sets of
>>> distinct legal requirements for privacy handling. In the absence of
>> strong
>>> legal guidance, I'd say err on the side of complying with health care
>>> regulations because that's where you're likely to find the clearest
>>> guidance and established tools.
>>> 
>>> Ping me on any PR you send.
>>> 
>>> On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna <sivaprasanna...@gmail.com
>>> 
>>> wrote:
>>> 
>>>> With data becoming more critical and substantial to business
>> development,
>>>> new stringent regulations & law are getting introduced (GDPR being a
>> recent
>>>> example), I've been spending some time lately doing research on data
>>>> anonymization and after some hefty thinking, I finally decided to go
>> ahead
>>>> with the creation of new processor bundle that has processors like
>>>> 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name
>>>> though). Following are my questions:
>>>> 
>>>>   - What do you guys think about these proposed processors?
>>>>   - If the processors are okay to be introduced, are they "standard"
>>>>   enough to get them added to our 'nifi-standard-bundles' module or
>> is it
>>>>   better to keep it separated much like others like AWS, Azure
>> bundles,
>>>> etc.
>>>> 
>>>> Having said this, I'm very much in the beginning phase with my
>> research and
>>>> development efforts so all your inputs & feedback on this one are
>> greatly
>>>> appreciated.
>>>> 
>>>> Thanks.
>>>> 
>>>> -
>>>> Sivaprasanna
>>>> 
>> 

Reply via email to