Storing and analyzing user agent strings, general approach

Mark Dodwell Thu, 26 Jun 2014 00:10:04 -0700

I want to store a bunch of documents in elasticsearch (which represent a 
hit to a website) including the user agent of the client that made the 
original HTTP request.


Since user agent strings have a lot of variance, and the useful parts need 
parsing out (OS, browser, version etc.) I would like to be able to perform 
aggregations on those extracted features.

The simplest way I can think to do this would be to analyze the user agent 
string before indexing the document. The downside to this approach is as 
new/different user agent strings emerge (which is not unlikely) you would 
have to proactively update the parser.

This may be impossibly/undesirable for a number of reasons, but what I'd 
really like to do is index the raw user agent string and then perform the 
analysis/feature extraction post-hoc at query time. Any ideas/pointers on 
how to do this?

Aggregators? Custom analyzers? (How would you handle an update to the 
analyzer, would you need to re-run against all existing stored data?)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed9bf030-f9bf-480a-88b1-a80421b9e79e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Storing and analyzing user agent strings, general approach

Reply via email to