Jira (PDB-1129) puppetdb anonymize should use stronger hash function

2015-01-22 Thread Kenneth Barber (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kenneth Barber commented on  PDB-1129 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: puppetdb anonymize should use stronger hash function  
 
 
 
 
 
 
 
 
 
 
Daniel Dreier they are consistent, because we adopt a memoization technique to always hash the same string the same way each time. This is important, because part of PuppetDB's storage efficiency is in its way to de-duplicate elements. So to test with real world data, its important we keep this characteristic, otherwise receiving anonymised data would lose some of its value to us. 
The trick is in the memoize function here: https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/anonymizer.clj#L147 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.10#6340-sha1:7ea293a) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-bugs@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-bugs.
For more options, visit https://groups.google.com/d/optout.


Jira (PDB-1129) puppetdb anonymize should use stronger hash function

2015-01-22 Thread Daniel Dreier (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Daniel Dreier commented on  PDB-1129 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: puppetdb anonymize should use stronger hash function  
 
 
 
 
 
 
 
 
 
 
Thanks, I appreciate the clarification. I had just noticed that some of our password fields were consistently being transformed into the same strings on multiple nodes, based on looking at anonymized catalogs: 
 
 
 
 
 
 
nesmivxvny.updwtobauiqzmot.tey.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
nfqvamgghr.kiscjhljfpykkrc.alr.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
nsicingnnd.kcxtlxjdcaeaotw.crf.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
nyviworens.sfqijlzbejkvaft.bwt.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
nzcgyapaut.mdsqzgmgprbzdim.ile.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
obshccyrld.yxjdmkjaezaopny.ruc.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
ocdsdhkysd.tlidopfakhduadc.swl.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
ocvsflhnop.sfimtslwvmsiuoq.xae.json:"sendgridpassword" : "VaoO2RU5sh10SmOSTNCn4Nz2BE7jJN", 
 
 
 
 
   

Jira (PDB-1129) puppetdb anonymize should use stronger hash function

2015-01-22 Thread Kenneth Barber (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kenneth Barber commented on  PDB-1129 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: puppetdb anonymize should use stronger hash function  
 
 
 
 
 
 
 
 
 
 
Yep Wyatt Alt is correct here, our use of hashes in general in PuppetDB is for "pseudo-uniquely" identifying a report/catalog/fact document in our database for performance reasons so the comparison is fast (versus comparing the entire document), not for hiding secret data. In fact, the only thing we generally care about when it comes to the original data is keeping the "shape" (like keeping it as a hash/array/string/number etc.). New random data is generated using various randomization functions (like randomizing strings/numbers/booleans etc.) and this data has no relation to the original data beyond just the 'shape'. As Wyatt says, we have to generate a new hash of the document after anonymization, which is why we do it in that code. 
So the use of sha1 hashing here by itself shouldn't provide a way to utilise brute force and determine its original form, if that's what you are concerned about. If the anonymization is ran correctly, any passwords should be replaced with useless strings only - chances are they won't even confirm to any crypto format to be honest after anonymization, we're just not that clever today, we just throw out random strings of characters instead . 
So I don't think we should panic here, the use of lesser grade algorithms isn't always a security issue, it all depends on the context for how they are used . 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.10#6340-sha1:7ea293a) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-bugs@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-bugs.
For more options, visit https://groups.google.com/d/optout.


Jira (PDB-1129) puppetdb anonymize should use stronger hash function

2015-01-22 Thread Wyatt Alt (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Wyatt Alt commented on  PDB-1129 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: puppetdb anonymize should use stronger hash function  
 
 
 
 
 
 
 
 
 
 
I think there's a misunderstanding here. We're using sha1 in that file to create a hash of a report/factset/catalog after it has already been anonymized by replacing data with random strings. 
See https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/cli/anonymize.clj#L210 
PDB uses sha1 for various internal comparisons and existence checks, but not for concealing private data. The hash must be the true hash of the stored data for internal consistency, but if the stored data is already anonymized I don't think there's really an issue. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.10#6340-sha1:7ea293a) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-bugs@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-bugs.
For more options, visit https://groups.google.com/d/optout.


Jira (PDB-1129) puppetdb anonymize should use stronger hash function

2015-01-21 Thread Daniel Dreier (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Daniel Dreier created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 PuppetDB /  PDB-1129 
 
 
 
  puppetdb anonymize should use stronger hash function  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Improvement 
 
 
 

Assignee:
 

 Unassigned 
 
 
 

Created:
 

 2015/01/21 9:55 PM 
 
 
 

Labels:
 

 security 
 
 
 

Priority:
 
  Normal 
 
 
 

Reporter:
 
 Daniel Dreier 
 
 
 
 
 
 
 
 
 
 
Based on my reading of https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/cli/anonymize.clj, it looks to me like the hashing function used to anonymize data is sha1.  
Given that sha1 is gradually being phased out in other contexts (google, mozilla), and we do expect people to hash secrets with it, it seems like we should adopt a more secure hashing algorithm, like sha256 or bcrypt. 
That said, I'm not a clojure developer, so I may have misread the code. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment