Re: Request for feedback on crypto privacy protections of geolocation data
On 11.09.2013, at 02:06 , Gervase Markham wrote: > On 10/09/13 19:05, Chris Peterson wrote: >> Our location service (and stumbler) also collects cell data, so we can >> geolocate with Wi-Fi AP and/or cell data. > > Sure. But in the rural areas I am thinking about, cells cover many > square km. The wifi access point has a much smaller range, and therefore > geolocates a person much more precisely. > > So it would be awesome if I could say "I'm in this network cell, near > this single access point - tell me where I am, please", and the service > complied. That's a good idea, I added a ticket about it at https://github.com/mozilla/ichnaea/issues/23 Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 17:41 , Daniel Veditz wrote: > That can't be right, so your database must be more complex. If you're > storing more than originally implied that may have some impact on a > security assessment. We apparently haven't been clear about the scope of the proposal. It only deals with a way to export and publicly share a subset of our data. Internally the service has a lot more data, but there's no way we can share that, thanks to the privacy aspects of it. But at this point it seems clear to me, that there's likely no way to share any meaningful subset or aggregated version of this data publicly at all. Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 20:23 , ianG wrote: > On 11/09/13 03:27 AM, Daniel Veditz wrote: >> "private" means we can't even /look/ at it, rather than merely can't >> store it? > > The data regime might be simply put as this: you can't store a number > suitable for tracking (or any derivative of it if that simply creates a new > tracking number) unless you have a compelling business reason, and you have > agreement. > > The EU data protection regime makes a very strong distinction about any > private tracking information. It also goes to another level if you share > that information with anyone. > > The initial simple answer is, don't go there. (I have no idea how google > finessed this issue, or even if they didn't.) Most of this is very much a gray area. The data privacy officers / protection agencies have generally recognized that location services based on wifi networks are a very useful service, and in order to practically run them, you have to be able to collect wifi bssid's without getting the individual assent of every wifi AP operator. But at the same time they consider the combination of a bssid, timestamp and geolocation as personally identifiable information suitable for tracking. Much like IP addresses, or phone numbers. So currently there's an unspoken agreement where industry players like Google, Microsoft and Apple have voluntarily put some restrictions into place. One of those is the introduction of the _nomap network name suffix, which was deemed an effective way for wifi operators to opt-out of the data gathering (see for example http://www.dutchdpa.nl/Pages/en_pb_20120405_google-complies-with-Dutch-DPA-requirements.aspx). Other cases where the introduction of the "you need to know two nearby wifis" to geolocate yourself protection. This was a measure suggested and implemented first by Google based on media outcries and has now become a industry best-practice. But it's not actually mandated by any official regulation to my knowledge. For now the whole space hasn't seen official tight regulation and the industry players are allowed to continue to operate. But it's a fine balance and any new media outcries or questionable behavior can threaten this balance. So for us this means trying to adhere to existing industry best practices and generally following data privacy best practices like: only gather and store what you need, delete data as soon as you don't need it anymore, etc. All of this applies to the hosted service use-case, where we keep the data internal and don't share or sell it for other purposes. Since it's all unofficial agreements, it's very hard to impossible to know exactly what we should do for the "we want to publicly share this data" use-case. Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 03:39 , Gervase Markham wrote: > BTW, how does the service figure out the lat/long of an AP? Do we do > anything at all with signal strengths? Could we? This is a bit off-topic for the security discussion. I suggest starting a new thread on dev-geolocation, if you want to know more about the technical details. The short answer is: Yes, but it's a lot more complicated than that :) Cheers :) Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 03:46 , Gervase Markham wrote: > On 10/09/13 10:48, ianG wrote: >> If that is the case, why not flip it around. Instead of trying to >> interpolate the existing data that is broadcast out there, why not write >> a protocol to broadcast the direct location from the wireless access point? > > Because only a tiny, tiny fraction of devices would run it, and for most > of those, the user wouldn't have correctly set the device's location > anyway, and for some of them, they'd have set it and then moved. > > This is a "boil the sea" approach to the problem. In addition the CDMA cell networks actually have support for reporting the base stations lat/lon as part of the protocol. But in practice these are almost never set, as cell operators value ease of deployment and uniform configuration more than providing this extra service. In another anecdote, mobile operators cannot actually give you lists of all their cell towers and locations - we asked our partners. Thanks to a multitude of subsidiaries, subcontractors and partnerships, they often don't actually know how many cell towers they have and where they are. The same problem applies to the many wifi AP's officially being operated by some large telco. So even where this is possible, it's not actually a practically relevant approach. Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 03:46 , Gervase Markham wrote: > On 10/09/13 00:25, R. Jason Cronk wrote: >> What happens if I move? > > The raw database notes that you are now being detected in a new > location. What happens then is up for debate. I'd argue that if your > position was fixed for N months before, and it seems fixed again now, we > should assume you have moved house and keep the point in the DB. APs > which seem to move a lot, or move regularly, should be excluded. As of this moment, we filter out any AP that has been detected in two different places (where different means more than ~1km away from each other). This is very conservative approach and we'll relax that later. The real strategy is going to involve thresholds of a certain number of reports over a certain time period. For example we could change the "canonical location" after an AP has been seen 10 times over the course of at least one week in a new location. Potentially we could flag it as "dirty" after seeing just two reports in a different location and remove it from the search results, while we wait for more reports and "confirmation". The exact numbers will depend on the reporting volume and spam problems. Anecdotal reports suggest it takes Apple and Google about a day to realize your AP has actually moved. Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 09.09.2013, at 18:41 , Eric Rescorla wrote: > 1. How do I bootstrap? I turn on my device and want to get the coordinates of > the aps I see. That requires a lat long for neighbors. What now? We build the database by having people use a stumbler application to sent us observations. The stumbler app uses the mobile phones GPS sensor to know its location. It reports all cell towers and wifi APs it sees to us in a certain location. We crunch some data, then we make a search API available over this data. Later someone else asks us what their location is, based on seeing cell towers or APs. > 2. As asked previously will the db be published or query able? It will definitely be queryable, but with a lot of restrictions to enhance privacy. We would like to publish it or as much of it as possible, but it's unclear how to do that, when a lot of the individual records are considered personally identifiable information. > 3. What is the lat/long resolution? How is it measured? The resolution differs, but is generally "as precise as it gets". So GPS sensors often have 5 meter precision, Google aims to do 1 meter resolution for indoor locations based on Wifi access points. Internally we currently store things with centimeter precision and timestamps in milliseconds - so definitely all on the far side of "extremely detailed / private". Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 09.09.2013, at 18:13 , Brian Smith wrote: > On Mon, Sep 9, 2013 at 2:58 PM, Chris Peterson wrote: >> Google's Location Service prevents people from tracking individual access >> points by requiring requests to include at least 2-3 access points that >> Google knows are near each other. This "proves" the requester is near the >> access points. > > I assume by "prevents people from tracking individual access points" > means the following: Some people have a personal access point on them > (e.g. in their phone). If somebody knows the SSID and MAC of this > personal access point, then they could track this person's location by > polling the database for that (SSID, MAC) pair. Google tries to limit > this type of abuse as much as practical while providing still > providing a location service based on such crowdsourced data. Yes :) Though there's one crucial difference between Google and us: We would like to make as much of this data public as possible, while Google will always just provide a service without access to the underlying data. >> Unlike Google's Location Service, our server does not store MAC addresses or >> SSIDs. We identify access points by hash IDs, specifically SHA1(MAC+SSID). >> To query the location of an access point in the database, you must know both >> its MAC address and current SSID. > > MAC addresses are 48 bits. SSIDs are often guessable or predictable. > Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is > not buying you much in terms of privacy, IMO. Basically, if you are > really trying to use this as a privacy mechanism then you should store > the MAC+SSID according to best practices for storing passwords. For > example, use PBKDF2 with a large number of iterations. Regardless of > whether you use SHA1, SHA2, PBKDF2, or something else, I will still > call whatever function you use H(x). But, I am not sure that switching > to PBKDF2 even buys you much improved privacy protection. We were looking for two things with using the sha1: - Make it possible for the end-user to change their unique value (they cannot change the mac address, but they can change the ssid). This allows them to "invalidate" historical records in the database. - Make it harder for spammers to "guess" actual unique keys and flood our service. Mac addresses have a vendor prefix, which makes it rather easy to generate lots of valid mac addresses. Taking the ssid into account makes it harder to generate valid keys. Unfortunately the ssid itself is considered private data in European countries, so you aren't allowed to store it without the users consent. That's why Google and everyone else has stopped storing them and only use mac addresses now. The sha1 scheme might be ineffective in doing this. >>H1 = Hash(AP1.MAC + AP1.SSID) >>H2 = Hash(AP2.MAC + AP2.SSID) >> >> Our private database's schema looks something like: >> >>Hash(AP1.MAC + AP1.SSID) ==> AP1.latitude, AP1.longitude, ... >>Hash(AP2.MAC + AP2.SSID) ==> AP2.latitude, AP2.longitude, ... >> >> Our published database would include two tables. The first table would map a >> random row id to metadata about an anonymous access point: >> >>Random1 ==> AP1.latitude, AP1.longitude, ... >>Random2 ==> AP2.latitude, AP2.longitude, ... >> >> The second table's primary key would be a hash of hashes. It would map a >> hash of two neighboring access points' hash IDs to a row id of the first >> table. Something like: >> >>Hash(H1 + H2) ==> Random1 >>Hash(H2 + H1) ==> Random2 >> >> Someone querying the published database would need to know the MAC addresses >> and current SSIDs of two neighboring access points to look up either's >> location. > > If you know the MAC+SSID of person X's personal access point and the > MAC+SSID of person Y's personal access point, then you can use this > database to ask the question "are person X and person Y in the same > location?" This seems bad. I see that you attempt to address this > below. On the service level, we can prevent this with adding extra thresholds. Like filtering out "moving" APs and only reporting APs which have been seen in the same location a number of times over a minimum time period. But this doesn't help us when publishing the underlying data. >> btw, should we use SHA-2 instead of SHA-1? > > There is no reason to use SHA-1 when you have SHA-2 available. > However, as I indicated above, it isn't clear it is a good idea to be > using any plain hash function as H(x). > >> Other layers of privacy protection include filtering out ad-hoc Wi-Fi >> networks; MAC addresses with vendor prefixes from mobile device manufacters >> (e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g. >> "XXX's iPhone" and Google's "_nomap" opt-out); and APs reported in multiple >> locations. > > I think that these things are much more important than the protection > offered by H(x). My concern is that if you store the data on the > server as