Re: Data anonymization in Nifi

2017-10-31 Thread Matt Burgess
Vyshali,

I would love to help, but I've never used ARX so I'm not at all
familiar with their APIs. They do have an examples page though [1].

Regards,
Matt

[1] http://arx.deidentifier.org/overview/#a3


On Tue, Oct 31, 2017 at 1:11 PM, Vyshali  wrote:
> Hi Matt,
>
> Thanks for your valuable comment.
>
> Is it possible to anonymize data without specifying generalization
> hierarchies in ARX.?
> Also,can you please help me with some basic examples using ARX APIs.
>
> Regards,
> Vyshali
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-31 Thread Vyshali
Hi Matt,

Thanks for your valuable comment.

Is it possible to anonymize data without specifying generalization
hierarchies in ARX.?
Also,can you please help me with some basic examples using ARX APIs.

Regards,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-24 Thread Mike Thomsen
Groovy is very close to being a superset of Java 7 in terms of syntax, so
in most cases you can copy and paste Java code directly into a Groovy
script without modification.

On Tue, Oct 24, 2017 at 8:52 AM, Vyshali  wrote:

> Matt,
>
> Thanks for your valuable suggestion.
> ARX supports JAVA and only languages like Groovy,Python,Jython,Python are
> available in executescript processor.Have you tried using ARX
> functionalities in any of these languages ?
> If so, please send some references.
>
> Thanks,
> Vyshali
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>


Re: Data anonymization in Nifi

2017-10-24 Thread Vyshali
Matt,

Thanks for your valuable suggestion.
ARX supports JAVA and only languages like Groovy,Python,Jython,Python are
available in executescript processor.Have you tried using ARX
functionalities in any of these languages ?
If so, please send some references.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-23 Thread Matt Burgess
Vyshali,

The AnonymizeRecord processor does not yet exist, I just wrote up a
Jira to track the addition of it possibly sometime in the future.

For the scripted solution, you can add the location of the ARX JARs to
the Module Directory property of ExecuteScript. If it is a flat
directory of JARs and you are using Groovy, Clojure, or Javascript,
you can just set the Module Directory to the directory containing the
JARs. Otherwise you'd have to list the JARs separately (for languages
such as Jython).  Once the Module Directory property is set, you can
import and use any of the ARX classes according to their
documentation.

For examples on using the NiFi API (to read/write flow files, etc.), I
have an ExecuteScript Cookbook blog series [1] and a few other
examples on my blog [2].

Regards,
Matt

[1] 
https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html
[2] http://funnifi.blogspot.com


On Mon, Oct 23, 2017 at 12:41 PM, Vyshali  wrote:
> Hi Matt,
>
> Thanks for the suggestion.
> It would be very much helpful if you can give the instruction on how to use
> the AnonymizeRecord processor.
> Please give some clarity on how to setup processor after downloading ARX
> jars
> I downloaded the jar from  http://arx.deidentifier.org/downloads/
> 
>
> Regards,
> Vyshali
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-23 Thread Vyshali
Hi Matt,

Thanks for the suggestion.
It would be very much helpful if you can give the instruction on how to use
the AnonymizeRecord processor.
Please give some clarity on how to setup processor after downloading ARX
jars
I downloaded the jar from  http://arx.deidentifier.org/downloads/
  

Regards,
Vyshali




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-22 Thread Chris Herssens
Hello Vyshali

below you can find  python  code example for hashing the fourth column of a
CSV file using the ExecuteScript processor
If you hash a field using SHA256 then the length of the field is changed.
A sha256 is 256 bits long

import hashlib
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

def hashField(text):
return hashlib.sha256(text.encode('ascii')).hexdigest()

class convertStream(StreamCallback):
  def __init__(self):
pass
  def process(self,inputStream,outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.ISO_8859_1)
output=[]
for line in text.splitlines():
l=line.split(';')
l[3] = hashField(l[3].lower())
l.append(l[3]+"_"+l[0]+"_"+l[1])
output.append(';'.join(l))
out='\n'.join(output)
outputStream.write(out.encode('latin-1'))

flowfile = session.get()
if(flowfile != None):
flowfile=session.write(flowfile,convertStream())
flowfile = session.putAttribute(flowfile, "filename",
flowfile.getAttribute('filename').split('.')[0]+'_hashed')
session.transfer(flowfile, REL_SUCCESS)
session.commit()



Regards,

Chris

On Fri, Oct 20, 2017 at 7:19 PM, Vyshali  wrote:

> Hi Chris,
>
> Thanks for the suggestion.Should I have code in python or some languagues
> for hashing the data using exectescript processor ? If so,will the format
> of
> the data be detained after hashing.
> Please provide some clarity on that.
>
> Thanks,
> Vyshali
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>


Re: Data anonymization in Nifi

2017-10-20 Thread Vyshali
Hi Chris,

Thanks for the suggestion.Should I have code in python or some languagues
for hashing the data using exectescript processor ? If so,will the format of
the data be detained after hashing.
Please provide some clarity on that.

Thanks,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-17 Thread Andy LoPresto
Vyshali,

You may be interested in format preserving encryption (FPE) [1] if you need to 
maintain format while performing data masking. There are also methods to derive 
a cryptographically secure hash function from encryption [2] so that you can 
have “one way” data transformation and maintain a given format.

I would encourage you to be aware of all attack surfaces here, though. First, 
there are many examples of anonymization being easily undone because it was not 
correctly implemented [3], used a weak process [4], or could be reconstructed 
through associated data [5]. Even with a strong anonymization approach, 
remember that NiFi tracks the data lineage throughout the process, so a user 
with sufficient permissions will be able to look at the provenance for a 
flowfile before/after it has undergone the anonymization operation and see the 
original data. This can be partially mitigated and restricted to a core group 
of privileged users via strict access control policies. On top of that, the 
provenance repository does provide an encrypted implementation, but the content 
and flowfile repositories currently do not. A malicious user with OS-level 
access could examine the repository files on disk to extract the original 
content or flowfile attributes before they were anonymized. There are open 
Jiras [6][7] for those efforts. There is also the issue of a user examining the 
flowfile via queue listing. Open Jiras for encrypting attributes [8] and 
hashing attributes [9], as well as “sensitive attributes” with 
per-key-permissions also exist [10].

I hope this helps to illustrate the complexities of anonymization and leads you 
to a successful solution.


[1] https://en.wikipedia.org/wiki/Format-preserving_encryption 

[2] 
https://crypto.stackexchange.com/questions/24284/is-there-a-format-preserving-cryptographically-secure-hash
 

[3] https://dataprivacylab.org/dataprivacy/projects/linkage/lidap-wp19.pdf 

[4] 
https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/
 

[5] https://hbr.org/2015/02/theres-no-such-thing-as-anonymous-data 

[6] https://issues.apache.org/jira/browse/NIFI-3834
[7] https://issues.apache.org/jira/browse/NIFI-3833
[8] https://issues.apache.org/jira/browse/NIFI-2961
[9] https://issues.apache.org/jira/browse/NIFI-1885
[10] https://issues.apache.org/jira/browse/NIFI-1140


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 17, 2017, at 10:36 AM, Mike Thomsen  wrote:
> 
> Not if you use hashing. You'll get a field value like this (sha1
> algorithm): c3499c2729730a7f807efb8676a92dcb6f8a3f8f
> 
> For getting closer to the original data in the sort of values present,
> you'll need to try something like ARX.
> 
> On Tue, Oct 17, 2017 at 11:53 AM, Vyshali  wrote:
> 
>> Hi Chris,
>> 
>> Hashing using executescript processor means that I should write some coding
>> logic to do that.If so,will the format of the field will remain the same ?
>> 
>> Please explain me with examples.
>> 
>> Regards,
>> Vyshali
>> 
>> 
>> 
>> --
>> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Data anonymization in Nifi

2017-10-17 Thread Mike Thomsen
Not if you use hashing. You'll get a field value like this (sha1
algorithm): c3499c2729730a7f807efb8676a92dcb6f8a3f8f

For getting closer to the original data in the sort of values present,
you'll need to try something like ARX.

On Tue, Oct 17, 2017 at 11:53 AM, Vyshali  wrote:

> Hi Chris,
>
> Hashing using executescript processor means that I should write some coding
> logic to do that.If so,will the format of the field will remain the same ?
>
> Please explain me with examples.
>
> Regards,
> Vyshali
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>


Re: Data anonymization in Nifi

2017-10-17 Thread Vyshali
Hi Chris,

Hashing using executescript processor means that I should write some coding
logic to do that.If so,will the format of the field will remain the same ?

Please explain me with examples.

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Data anonymization in Nifi

2017-10-17 Thread Matt Burgess
Vyshali,

Building on Chris's suggestion of using ExecuteScript, you could also
include the ARX JAR(s) in your Module Directory property, and then
leverage all the ARX goodness [1].  In general this does seem like a
good idea for a processor, I have written NIFI-4492 [2] to add an
AnonymizeRecord processor. It need not use ARX but I did mention it in
the Jira case.

Regards,
Matt

[1] http://arx.deidentifier.org/api/
[2] https://issues.apache.org/jira/browse/NIFI-4492


On Tue, Oct 17, 2017 at 8:09 AM, Chris Herssens
<chris.herss...@gmail.com> wrote:
> You can use the ExecuteScript processor for hashing some fields is for
> instance CSV data
>
> Regards,
>
> Chris
>
> On Tue, Oct 17, 2017 at 8:41 AM, Vyshali <vyshal...@honeywell.com> wrote:
>
>> Hi,
>>
>> Please suggest possible ways to do data anonymization in Nifi such that PII
>> data is not exposed.
>> Suggest suitable processors for the same.
>> Thanks in advance.
>>
>> Regards,
>> Vyshali
>>
>>
>>
>> --
>> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>>


Re: Data anonymization in Nifi

2017-10-17 Thread Chris Herssens
You can use the ExecuteScript processor for hashing some fields is for
instance CSV data

Regards,

Chris

On Tue, Oct 17, 2017 at 8:41 AM, Vyshali <vyshal...@honeywell.com> wrote:

> Hi,
>
> Please suggest possible ways to do data anonymization in Nifi such that PII
> data is not exposed.
> Suggest suitable processors for the same.
> Thanks in advance.
>
> Regards,
> Vyshali
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>


Data anonymization in Nifi

2017-10-17 Thread Vyshali
Hi,

Please suggest possible ways to do data anonymization in Nifi such that PII
data is not exposed.
Suggest suitable processors for the same.
Thanks in advance.

Regards,
Vyshali



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/