Ottomata has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/352582 )
Change subject: Add bot filter to mysql consumer
......................................................................
Add bot filter to mysql consumer
Wraps mysql consumer's input URI with a filter that prevents events
triggered by bots/spiders from reaching mysql.
Bug: T67508
Change-Id: I3c82e63b498a56f503a6af22ca1278d0d1c27063
---
A modules/eventlogging/files/filters.py
M modules/role/manifests/eventlogging/analytics/mysql.pp
2 files changed, 19 insertions(+), 1 deletion(-)
Approvals:
Ottomata: Looks good to me, approved
jenkins-bot: Verified
diff --git a/modules/eventlogging/files/filters.py
b/modules/eventlogging/files/filters.py
new file mode 100644
index 0000000..a60a92d
--- /dev/null
+++ b/modules/eventlogging/files/filters.py
@@ -0,0 +1,8 @@
+import json
+
+
+def is_not_bot(e):
+ try:
+ return not json.loads(e['userAgent'])['is_bot']
+ except (ValueError, KeyError):
+ return True
diff --git a/modules/role/manifests/eventlogging/analytics/mysql.pp
b/modules/role/manifests/eventlogging/analytics/mysql.pp
index 54c90b1..1b410a1 100644
--- a/modules/role/manifests/eventlogging/analytics/mysql.pp
+++ b/modules/role/manifests/eventlogging/analytics/mysql.pp
@@ -18,6 +18,10 @@
labs => '127.0.0.1/log',
}
+ eventlogging::plugin { 'filters':
+ source => 'puppet:///modules/eventlogging/filters.py',
+ }
+
# Run N parallel mysql consumers processors.
# These will auto balance amongst themselves.
$mysql_consumers = hiera(
@@ -43,10 +47,16 @@
# For beta cluster, set in
https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep
$statsd_host = hiera('eventlogging_statsd_host',
'statsd.eqiad.wmnet')
+ # Filtering function to use on events consumed by mysql
+ $filter_function = '&function=is_not_bot'
+
+ # Custom URI scheme to pass events through filter
+ $filter_scheme = 'filter://'
+
# Kafka consumer group for this consumer is mysql-m4-master
eventlogging::service::consumer { $mysql_consumers:
# auto commit offsets to kafka more often for mysql consumer
- input =>
"${kafka_mixed_uri}&auto_commit_interval_ms=1000${$kafka_api_version_param}",
+ input =>
"${filter_scheme}${kafka_mixed_uri}&auto_commit_interval_ms=1000${$kafka_api_version_param}${filter_function}",
output =>
"mysql://${mysql_user}:${mysql_pass}@${mysql_db}?charset=utf8&statsd_host=${statsd_host}&replace=True",
sid => $kafka_consumer_group,
# Restrict permissions on this config file since it contains a
password.
--
To view, visit https://gerrit.wikimedia.org/r/352582
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I3c82e63b498a56f503a6af22ca1278d0d1c27063
Gerrit-PatchSet: 9
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Fdans <[email protected]>
Gerrit-Reviewer: Elukey <[email protected]>
Gerrit-Reviewer: Fdans <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits