Ottomata has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/352582 )

Change subject: Add bot filter to mysql consumer
......................................................................


Add bot filter to mysql consumer

Wraps mysql consumer's input URI with a filter that prevents events
triggered by bots/spiders from reaching mysql.

Bug: T67508
Change-Id: I3c82e63b498a56f503a6af22ca1278d0d1c27063
---
A modules/eventlogging/files/filters.py
M modules/role/manifests/eventlogging/analytics/mysql.pp
2 files changed, 19 insertions(+), 1 deletion(-)

Approvals:
  Ottomata: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/eventlogging/files/filters.py 
b/modules/eventlogging/files/filters.py
new file mode 100644
index 0000000..a60a92d
--- /dev/null
+++ b/modules/eventlogging/files/filters.py
@@ -0,0 +1,8 @@
+import json
+
+
+def is_not_bot(e):
+    try:
+        return not json.loads(e['userAgent'])['is_bot']
+    except (ValueError, KeyError):
+        return True
diff --git a/modules/role/manifests/eventlogging/analytics/mysql.pp 
b/modules/role/manifests/eventlogging/analytics/mysql.pp
index 54c90b1..1b410a1 100644
--- a/modules/role/manifests/eventlogging/analytics/mysql.pp
+++ b/modules/role/manifests/eventlogging/analytics/mysql.pp
@@ -18,6 +18,10 @@
         labs       => '127.0.0.1/log',
     }
 
+    eventlogging::plugin { 'filters':
+        source => 'puppet:///modules/eventlogging/filters.py',
+    }
+
     # Run N parallel mysql consumers processors.
     # These will auto balance amongst themselves.
     $mysql_consumers = hiera(
@@ -43,10 +47,16 @@
     # For beta cluster, set in 
https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep
     $statsd_host          = hiera('eventlogging_statsd_host', 
'statsd.eqiad.wmnet')
 
+    # Filtering function to use on events consumed by mysql
+    $filter_function      = '&function=is_not_bot'
+
+    # Custom URI scheme to pass events through filter
+    $filter_scheme        = 'filter://'
+
     # Kafka consumer group for this consumer is mysql-m4-master
     eventlogging::service::consumer { $mysql_consumers:
         # auto commit offsets to kafka more often for mysql consumer
-        input  => 
"${kafka_mixed_uri}&auto_commit_interval_ms=1000${$kafka_api_version_param}",
+        input  => 
"${filter_scheme}${kafka_mixed_uri}&auto_commit_interval_ms=1000${$kafka_api_version_param}${filter_function}",
         output => 
"mysql://${mysql_user}:${mysql_pass}@${mysql_db}?charset=utf8&statsd_host=${statsd_host}&replace=True",
         sid    => $kafka_consumer_group,
         # Restrict permissions on this config file since it contains a 
password.

-- 
To view, visit https://gerrit.wikimedia.org/r/352582
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3c82e63b498a56f503a6af22ca1278d0d1c27063
Gerrit-PatchSet: 9
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Fdans <[email protected]>
Gerrit-Reviewer: Elukey <[email protected]>
Gerrit-Reviewer: Fdans <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to