On Thu, Feb 6, 2014 at 1:40 AM, Daniel Thomas <[email protected]> wrote: > On 06/02/14 01:35, Trevor Perrin wrote: >> I like the smaller size of the pseudowords, particularly for >> transcribing these things, spelling out the characters over the phone, >> or viewing on a small screen. And a lot of the words are unusual so >> are going to need to be spelled out. >> >> But it would be interesting to see what a better wordlist looks like. > > Diceware[0] is has a (fairly short 7776) word list in multiple languages > for the purpose of generating easy to remember passphrases.
Hi Daniel, That's about a 13-bit list, which seems like a good middle ground between an 8-bit list (like PGPfone) or a 16-bit list. An 8-bit list necessitates 16 word fingerprints for 128-bit security, which feels like too many words. A 16-bit list contains 65K words, which is more than most people's vocabulary, meaning a lot of unusual words that would have to be spelled out. The Diceware dictionary is designed around short words and word fragments (it includes numbers, punctuation, and non-words, which is a bit weird IMO). I wrote a script to generate 10 random Diceware words to see what fingerprints might look like: https://github.com/trevp/keyname hop - flu - urn - belie - gogo - gravy - mayor - avow - plush - enter bump - seem - soft - lm - plane - exit - plus - stilt - behind - malta tract - rude - rhine - ready - climb - fell - fell - reek - cody - kudzu bunch - sound - adler - galt - signor - glom - soup - on - lund - juju essay - eave - ef - pro - stung - gn - smash - josef - vetch - busy dawson - tic - vy - cake - rock - sr - store - ice - plunk - gp old - swept - win - mike - xy - chill - seethe - allow - alva - jh grace - curia - coke - rebut - 15 - foray - jaw - weco - anvil - buenos pn - adair - swelt - faith - slash - berlin - watch - blood - start - santa grow - del - bon - 99th - kepler - cam - fun - 37th - dryad - prone Below compares 5 diceware fingerprints side-by-side with 5 pseudoword fingerprints of score=18. The pseudoword fingerprints took an average of ~30 seconds apiece to generate on a single core of my Macbook Air. (The max possible score is 20, a score of 18 means 2 deviations from vowel/consonant alternation): oman - swath - haze - elmer - gouda - admix - feat - afar - reel - for ukigex - 3kiw - jejod - yvak - rewupa blitz - teal - emma - bambi - queen - 92 - mecum - om - derek - twa lijuv7 - woxm - pokoj - cixa - ehajen op - zomba - 84th - soy - oval - evolve - spook - fk - ghi - magog syivoh - upim - leewo - hoda - madeso piotr - vain - david - mk - gasp - buoy - malt - az - hang - rena bewora - zutm - hirub - ugux - tlezeb perk - fate - cinch - gulf - jb - marks - wag - canoe - sprig - maw ripoyu - ime2 - fenef - aqos - lehnof Both approaches seem pretty decent, not sure which is best. Choosing 13-bit wordlists for different languages and dealing with cross-language compatibility seems a hassle, but so is computing tens of millions of hashes for a fingerprint. There's a lot more that could be done here: e.g. make a better wordlist than Diceware, or optimize the pseudoword search and do better scoring. If anyone wants to do UX research, these would be great projects... Trevor _______________________________________________ Messaging mailing list [email protected] https://moderncrypto.org/mailman/listinfo/messaging
