[ https://issues.apache.org/jira/browse/MADLIB-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-911: ----------------------------------- Description: Story As a data scientist, I want to perform anonymization operations on my data, so that I can prepare it for input to predictive analytics algorithms. I also want to be able to de-anonymize my data. Proposed functionality: * Create conversion table for anonymization. * Create an anonymized version of a table. * Create a deanonymized version of a table Must be able to: * anonymize multiple columns in a table * datasets will still join correctly even on masked columns * the aggregates on masked columns will match to the original References [1] PDL tools http://pivotalsoftware.github.io/PDLTools/group__grp__anonymization.html [2] General information on anonymization https://en.wikipedia.org/wiki/Data_anonymization [3] Blog on hashing https://crackstation.net/hashing-security.htm was: Story As a data scientist, I want to perform anonymization operations on my data, so that I can prepare it for input to predictive analytics algorithms. I also want to be able to de-anonymize my data. Proposed functionality: * Create conversion table for anonymization. * Create an anonymized version of a table. * Create a deanonymized version of a table Must be able to: * anonymize multiple columns in a table * datasets will still join correctly even on masked columns * the aggregates on masked columns will match to the original References [1] PDL tools http://pivotalsoftware.github.io/PDLTools/group__grp__anonymization.html [2] General information on anonymization https://en.wikipedia.org/wiki/Data_anonymization > Anonymization > ------------- > > Key: MADLIB-911 > URL: https://issues.apache.org/jira/browse/MADLIB-911 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities > Reporter: Frank McQuillan > Assignee: Himanshu Pandey > Priority: Major > Labels: starter > > Story > As a data scientist, I want to perform anonymization operations on my data, > so that I can prepare it for input to predictive analytics algorithms. I > also want to be able to de-anonymize my data. > Proposed functionality: > * Create conversion table for anonymization. > * Create an anonymized version of a table. > * Create a deanonymized version of a table > Must be able to: > * anonymize multiple columns in a table > * datasets will still join correctly even on masked columns > * the aggregates on masked columns will match to the original > References > [1] PDL tools > http://pivotalsoftware.github.io/PDLTools/group__grp__anonymization.html > [2] General information on anonymization > https://en.wikipedia.org/wiki/Data_anonymization > [3] Blog on hashing > https://crackstation.net/hashing-security.htm -- This message was sent by Atlassian JIRA (v7.6.3#76005)