Hi, On 9 February 2012 08:40, Bertrand Delacretaz <[email protected]>wrote:
> On Tue, Feb 7, 2012 at 11:30 PM, Stefano Andreani > <[email protected]> wrote: > > ...I see the following process to be put in place: > > 1. analyze logs form RELEVANT web servers... > > > With RELEVANT I mean in terms of geographical coverage and of device > access... > > Agree once we go into details of specific devices, but just having the > set of User-Agent values seen on any busy website allows us to > discover which User-Agent values are missing from the devicemap > repository - which is the first step towards having as many devices in > there as possible. > 'asql' is quite a nice way to look at log data ( http://www.steve.org.uk/Software/asql/). asql> load /var/log/apache2/access.log asql> select distinct(agent) from logs group by agent limit 10; - Apple-PubSub/65.21 Apple-PubSub/65.23 Apple-PubSub/65.28 Baiduspider+(+http://www.baidu.com/search/spider.htm) Baiduspider-image+(+http://www.baidu.com/search/spider.htm) Biz360 spider ([email protected]; http://www.biz360.com) BlackBerry8520/5.0.0.592 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/168 BlackBerry8530/5.0.0.654 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/389 BlackBerry9530/5.0.0.1041 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105 And it can be scripted, so: $ asql --load /var/log/apache2/access.log --execute 'select distinct(agent) from logs group by agent limit 100;' > /tmp/ua.txt It's probably not optimal on large log files as it creates a temporary SQLite database, but it's nice for getting quick results without lots of sed and awk. I guess the next step is to define what shape the devicemap UA repository should be like. Suggestions? Andrew. -- [email protected] / [email protected] http://www.andrewsavory.com/
