Hi,

On 9 February 2012 08:40, Bertrand Delacretaz <[email protected]>wrote:


> On Tue, Feb 7, 2012 at 11:30 PM, Stefano Andreani
> <[email protected]> wrote:
> > ...I see the following process to be put in place:
> > 1. analyze logs form RELEVANT web servers...
>
> > With RELEVANT I mean in terms of geographical coverage and of device
> access...
>
> Agree once we go into details of specific devices, but just having the
> set of User-Agent values seen on any busy website allows us to
> discover which User-Agent values are missing from the devicemap
> repository - which is the first step towards having as many devices in
> there as possible.
>

'asql' is quite a nice way to look at log data (
http://www.steve.org.uk/Software/asql/).

asql> load /var/log/apache2/access.log
asql> select distinct(agent) from logs group by agent limit 10;
-
Apple-PubSub/65.21
Apple-PubSub/65.23
Apple-PubSub/65.28
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Baiduspider-image+(+http://www.baidu.com/search/spider.htm)
Biz360 spider ([email protected]; http://www.biz360.com)
BlackBerry8520/5.0.0.592 Profile/MIDP-2.1 Configuration/CLDC-1.1
VendorID/168
BlackBerry8530/5.0.0.654 Profile/MIDP-2.1 Configuration/CLDC-1.1
VendorID/389
BlackBerry9530/5.0.0.1041 Profile/MIDP-2.1 Configuration/CLDC-1.1
VendorID/105

And it can be scripted, so:
$ asql --load /var/log/apache2/access.log --execute 'select distinct(agent)
from logs group by agent limit 100;' > /tmp/ua.txt

It's probably not optimal on large log files as it creates a temporary
SQLite database, but it's nice for getting quick results without lots of
sed and awk.

I guess the next step is to define what shape the devicemap UA repository
should be like. Suggestions?


Andrew.
--
[email protected] / [email protected]
http://www.andrewsavory.com/

Reply via email to