I'm looking for some feedback on crypto privacy protections for a
geolocation research project I'm working on with the Mozilla Services
team. If you have general questions or suggestions about the project,
I'm happy to answer them, but I'd like to focus this thread on crypto.
Our team is prototyping a crowd-sourced version of Google's Street View
cars to correlate Wi-Fi access points and cell towers to GPS positions.
Our primary motivation is to provide non-proprietary location services
for Firefox OS devices. We would also like to publish this location data
for researchers or other projects that might have novel uses for it.
Google's Location Service prevents people from tracking individual
access points by requiring requests to include at least 2-3 access
points that Google knows are near each other. This "proves" the
requester is near the access points.
Below is a sketch of a scheme that I think will allow us to publish a
database of access point locations while still requiring knowledge of
two neighboring access points.
Unlike Google's Location Service, our server does not store MAC
addresses or SSIDs. We identify access points by hash IDs, specifically
SHA1(MAC+SSID). To query the location of an access point in the
database, you must know both its MAC address and current SSID.
Our private database maps access point hash IDs to locations (and other
metadata). Assuming:
H1 = Hash(AP1.MAC + AP1.SSID)
H2 = Hash(AP2.MAC + AP2.SSID)
Our private database's schema looks something like:
Hash(AP1.MAC + AP1.SSID) ==> AP1.latitude, AP1.longitude, ...
Hash(AP2.MAC + AP2.SSID) ==> AP2.latitude, AP2.longitude, ...
Our published database would include two tables. The first table would
map a random row id to metadata about an anonymous access point:
Random1 ==> AP1.latitude, AP1.longitude, ...
Random2 ==> AP2.latitude, AP2.longitude, ...
The second table's primary key would be a hash of hashes. It would map a
hash of two neighboring access points' hash IDs to a row id of the first
table. Something like:
Hash(H1 + H2) ==> Random1
Hash(H2 + H1) ==> Random2
Someone querying the published database would need to know the MAC
addresses and current SSIDs of two neighboring access points to look up
either's location.
btw, should we use SHA-2 instead of SHA-1? In 2009, NIST recommended
that "Federal agencies should stop using SHA-1 for applications that
require collision resistance as soon as practical, and must use the
SHA-2 family of hash functions for these applications after 2010."
Other layers of privacy protection include filtering out ad-hoc Wi-Fi
networks; MAC addresses with vendor prefixes from mobile device
manufacters (e.g. Apple and HTC); SSIDs commonly associated with mobile
devices (e.g. "XXX's iPhone" and Google's "_nomap" opt-out); and APs
reported in multiple locations.
thanks,
chris
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security