On Wed, 19 Mar 2014, Xuri Nagarin wrote:
IMHO, you really don't want your log collection tier doing data
enrichment because now the collection tier has a dependency on the
enrichment source. If the enrichment source stops responding, your
pipeline can break. Even if the enrichment source does not fail, your
collection will slow down to the level of responsiveness of the
enrichment source. A simple example is DNS lookups. For large volumes
of logs, a DNS outage can wreak havoc on your log collection tier. I
have seen instances of the log collection tier effectively DoS-ing the
network because DNS infra failed, which is the last thing you want.
All very true, but the approach that we were looking at with table lookup is to
load the geoIP lookup tables in RAM and then use those, so you would not have
the risk of it causing such a slowdown.
Without mentioning specific tools, a good enrichment architecture,
binds raw data with enrichment sources at query time. For example, you
throw log data in a database. You then have a layer of log query APIs
that pull raw data from the database, join it with static enrichment
sources (also maybe stored in the database) and also join dynamic
enrichment sources. At a 10,000 ft level, your log query app makes an
API call to the logQuery layer and asks for say firewall logs as,
{TimeStamp, FirewallName, SourceIP, SourceIPCountry, DestinationIP,
destinationIPCountry, RblRatingofSourceIP, RblRatingOfDestinationIP}.
In the backend, the logQuery layer grabs raw data from the database,
grabs GeoIP from either local file or some cloud service and grabs RBL
info dynamically from a public RBL source.
The problem with this approach is that sometimes you need to do a HUGE amount of
work at query time to do this.
For example, say you want to get a report of how many users you have from each
state in a given time window. If you already did the IP->state lookup and logged
the result, this is a very simple answer (possibly answerable just through index
lookups without having to retrieve any data rows)
You can then seamlessly add more intelligence/enrichment sources to
your logQuery API and all data (new and old) is instantly enriched.
Nice, right? :)
only "instantly" if you ignore the cost of the query.
Some of this trade-off depends on how many times you query the data (if you only
query a small portion of your data, you avoid looking up the info on the rest of
it)
but another tradeoff is the response time of your query. If you are
investigating a security incident, anything that you can do ahead of time to
make the queries faster can be very worthwhile.
There is a flaw in dynamic enrichment, that is, current intelligence
from the enrichment source maybe inaccurate for historic data. Again,
simple example DNS. Say, 10.0.0.22 was called joe-workstation three
months ago. You run a query to look for 10.0.0.22 today and pull three
month's worth of logs. But now, 10.0.0.22 is called marys-workstation.
If you do DNS lookup at query time, you will end up with bad data.
Yep.
At my workplace, we are still evolving tools/methods to handle all
this so ask me again in a few months and maybe I will have a better
answer :) Till then, maybe someone else can chime in with a better
solution?
I tend to favor spending processor time early to save human wait time later,
especially if the processor time would otherwise be idle
David Lang
On Tue, Mar 18, 2014 at 9:02 PM, David Lang <[email protected]> wrote:
On Tue, 18 Mar 2014, Otis Gospodnetic wrote:
Hi,
Does rsyslog have anything for doing geoIP lookup?
currently no, it does not. We have a feature speced out, but not implemented
that would provide this capability (table lookups), but the sponsor backed
out and nobody has picked it up (either as a sponsor or to develop it)
David Lang
If not, how are others handling this (other than passing logs through
Logstash)?
A quick Google found https://github.com/mricon/howler , but that's
obviously not quite the same as figuring out the
city/country/lat,lon/postal code info from IP addresses.
Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.