Keeping this on the public list - warning: it's a bit of a brain-dump / rant :)

I have to admit that the main reason I took a look at the map was from a 
performance / scaling perspective. Abusing the number of data points per grid 
to indicate something, was the only user visible change I could implement in a 
short time. Abusing point density to indicate recency may not be the best 
choice.

Today we have about 400 million grids (100x100 meter areas) as input into the 
mapmaking. The way we generated the map took close to 10 hours each day. I was 
able to speed it up by parallelizing more of it, but it still takes 6 hours 
each night.

If we were to use smaller grids, we'd increase the number of grids and the 
processing time. If we'd use 30x30 meter grids, we'd probably end up with 
roughly 5 times as many grids. You can fit 11 smaller grids into one big grid 
(10000 sqm ~= 11 x 900 sqm), but from experience we know that only about half 
of the smaller grids would actually have data points in them (in cities most of 
them have data, on country roads most of them are empty).

This means we'd end up with 2 billion grids and the map would take more than 24 
hours to generate, which is rather silly.

We are now at a point, where it makes more sense to build maps showing the 
locations of all the cell and wifi networks. WiFi networks also tend to be 
colocated, so our 440 million WiFi networks only result in about 220 million 
locations rounded to 10 meter precision. The 18 million cells almost don't 
matter from a scaling perspective.

220 million is already lower than the number of grids we have today, with a 
much better precision. For cell networks we'd really want different maps to 
show different carriers and radio standards, so one could look at only AT&T's 
LTE networks or only Deutsche Telekom's GSM networks.

Those kinds of maps would actually tell you whether or not someone collected 
WiFi networks or cell networks from your carrier in a specific place. For the 
purpose of "should I visit this street" these maps would be more useful. The 
datamap we have today only says that someone went to a place and collected at 
least one data point. But it could have been a single cell entry for a GSM 
network of a different carrier and no WiFi data at all.

Unfortunately we don't know how to build these maps at the scale we have. All 
the standard mapping tools will only be performant until about 10000 points on 
desktop browsers and more like 100 to 1000 points on low-cost Android phones. 
Since the target users would be primarily Mozilla Stumbler users, we have to 
aim for low-cost Android phones with screen sizes of ~300x500 pixels as the 
main audience.

The datamap we have is a nice eye-candy and looks cool. It shows the general 
global momentum and reach of the project. But there is no way to tweak it to 
really do the "should I walk over / drive this street" use-case justice. We 
only keep rendering it at fairly high zoom levels, because we can and we don't 
have anything better.

If anyone wants to help and figure out how to build maps of cell or wifi 
networks, we need you :)

Hanno

> On 17.11.2015, at 01:36, Felix Baumann <[email protected]> wrote:
> 
> Hi Hanno,
> 
> awesome that you are working on the map again!
> 
> It's nice that you can kinda distinguish older and newer entries now.
> But newer entries now simply look like their coverage is denser now.
> (I'm not sure if I like it, but it's a way to distinguish routes and it works)
> 
> I know you are tired of the discussion about privacy,
> but I have another suggestion, now that I see the new map style:
> But what do you think about making older entries more precise than newer ones?
> 
> This would help a lot in improving the map's precision and therefore Ichnaea
> might get a lot more precise as well, because the stumblers won't have to
> remember which streets they already stumbled and which not and can therefore
> stumble streets which were previously unstumbled.
> 
> The home of persons stumbling would still be obfuscated, because the map
> would be blurred there, so it shouldn't be much of a problem privacy-wise.
> Only older reports were stumblers haven't been for months would be accurate.
> 
> But at least other positions where people weren't stumbling for months would
> be precise so that the coverage could be optimized.
> 
> This would improve the following issue:
> In cities/villages, stumbling one street will mark adjacent streets as well,
> even though you didn't stumble many of the wifis in the other street. (Often 
> if
> you look at the map the dots aren't placed on the corresponding streets where
> they were recorded but next to them -> between two streets)
> 
> IMO the map should still be a bit obfuscated (whether a position has been
> stumbled lots of times or just once shouldn't be visible), but the position of
> the older reports should be precise. And the dots should be small(er).
> 
> This could be achieved by taking all records out of a square of 30m x 30m 
> (roughly
> 30ms diameter, a circle would be nicer but it's harder to calculate) and 
> placing
> a dot at their average position (not the middle). So if most of them are at 
> 5x10
> then dot should be there instead of 15x15.
> 
> This would hide lots of visits but shows an accurate position.
> 
> This could be adjusted: if there are more than 100(0) visits it could be an 
> accuracy
> of 20x20 or 10x10. This would result in a lot more dots but shouldn't affect 
> privacy,
> because if the number of visits is high enough, then it gets unlikely, that 
> it is just
> somebody's home. (Rather a train station or a sight)
> 
> One reason why it might be time to rethink the privacy strategy behind the 
> map:
> MLS has grown a lot since the decision has been made, that the map should
> be obfuscated. It's not that easy anymore to see someone's home on the map.
> And there are many people that participate in it now.
> 
> 
> Do you think this idea preserves the stumblers' privacy enough?
> 
> 
> Regards,
> Felix

_______________________________________________
dev-geolocation mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-geolocation

Reply via email to