Sivaprasanna, Thanks for joining this effort. I don’t recall what’s on the existing Jira, but please be very aware of the challenges in data anonymization and the various threat models — de-anonymizing data can lead to the leak of PII, EPHI, PCI data, etc. In some cases, it can even lead to physical danger against persons.
There are a number of high impact examples of avoidable scenarios like this. https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/ We should use publicly reviewed algorithms, document the risks and known challenges well, take into consideration provenance and other NiFi-specific features, and write a good summary of these features if/when they are introduced. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jun 20, 2018, at 10:06, Sivaprasanna <sivaprasanna...@gmail.com> wrote: > > Wow.. I dint realize there was a JIRA already. I'm interested and would be > happy to contribute my time & efforts on this. > >> On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess <mattyb...@apache.org> wrote: >> >> I think is a great idea, I filed a Jira [1] a while ago in case >> someone wanted to start working on it (or in case I got a chance). It >> mentions ARX but any Apache-friendly implementation is of course >> welcome. I think it should be in its own bundle as it is functionality >> separate from all our other bundles (and not ubiquitous enough to put >> in the standard NAR). >> >> Glad to hear you're interested in this, please feel free to reach out >> with any questions and I too would be happy to review any >> contributions. >> >> Thanks, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-4492 >> >> On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen <mikerthom...@gmail.com> >> wrote: >>> >>> There's a framework called ARX that could very useful for this. The only >>> question you have is how compliant it would be with different sets of >>> distinct legal requirements for privacy handling. In the absence of >> strong >>> legal guidance, I'd say err on the side of complying with health care >>> regulations because that's where you're likely to find the clearest >>> guidance and established tools. >>> >>> Ping me on any PR you send. >>> >>> On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna <sivaprasanna...@gmail.com >>> >>> wrote: >>> >>>> With data becoming more critical and substantial to business >> development, >>>> new stringent regulations & law are getting introduced (GDPR being a >> recent >>>> example), I've been spending some time lately doing research on data >>>> anonymization and after some hefty thinking, I finally decided to go >> ahead >>>> with the creation of new processor bundle that has processors like >>>> 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name >>>> though). Following are my questions: >>>> >>>> - What do you guys think about these proposed processors? >>>> - If the processors are okay to be introduced, are they "standard" >>>> enough to get them added to our 'nifi-standard-bundles' module or >> is it >>>> better to keep it separated much like others like AWS, Azure >> bundles, >>>> etc. >>>> >>>> Having said this, I'm very much in the beginning phase with my >> research and >>>> development efforts so all your inputs & feedback on this one are >> greatly >>>> appreciated. >>>> >>>> Thanks. >>>> >>>> - >>>> Sivaprasanna >>>> >>