Hi Bertrand, I agree that web server logs are the first input to start the activity on DeviceMap data. I don't agree it would be a good idea working on the immediate available web server, like apache.org.
I see the following process to be put in place: 1. analyze logs form RELEVANT web servers 2. use the AVAILABLE TOOLS to discover device model and to find device properties 3. release DeviceMap data snapshot 4. wait 30 days 5. goto 1 With RELEVANT I mean in terms of geographical coverage and of device access. Geographical coverage is very important, because [mobile] devices are distributed by device manufacturers not homogeneously. Each market is different: while in US today the traffic from connected devices would be 9X% desktop browsers+iOS+Android, we can find emerging markets where a relevant rate is made by feature phones, set-top-boxes, etc. With "relevant in terms of device access" I mean that the server to be analyzed should contain contents suitable for consumption by heterogeneous devices: weather forecasts are generic contents, while a web server distributing developer's tools would be not generic enough to be accessed by heterogeneous devices. With AVAILABLE TOOLS I mean both manual analysis and search in Internet, and, hopefully soon, the Web crawlers developed within the DeviceMap project to grab information from publicly available sources. P.S. I'm trying "Clean and Build" but the process doesn't run... can you please fix it? ;) -Stefano On 03/feb/2012, at 18.30, Bertrand Delacretaz wrote: > Hi, > > Another idea discussed with Philip is using web server logs to find > out which User-Agent values are out there. > > We can probably get logs from apache.org (which gets tons of traffic, > not sure how much mobile but that's probably growing), and we could > ask for contributions of more suitably anonymized logs from other > websites. > > WDYT? > -Bertrand
