OliverKeyes has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/193982

Change subject: Include mediawiki.org pageviews
......................................................................

Include mediawiki.org pageviews

MediaWiki.org pageviews should be included, and currently aren't. Rather
than screw with the structure of the existing regular expressions
(which neatly handle *.wikimedia and *.wiki* projects, respectively)
I've converted the wikidata string (which must be handled distinctly
due to the www.) into a regular expression and incorporated mediawiki.org

Also included is a unit test. Everyone loves unit tests.

Change-Id: I3a80cb75f175f715f0a369879ee2851217d0ffd1
---
M 
refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
M refinery-core/src/test/resources/pageview_test_data.csv
2 files changed, 5 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/analytics/refinery/source 
refs/changes/82/193982/1

diff --git 
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
 
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
index b4d195a..3a9bfda 100644
--- 
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
+++ 
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
@@ -38,7 +38,9 @@
         + "inews|ipedia|iquote|isource|tionary|iversity|ivoyage))\\.org$"
     );
 
-    private static final String uriHostWikidataString = "www.wikidata.org";
+    private static final Pattern uriHostOtherProjectsPattern = Pattern.compile(
+        "(wikidata|mediawiki)\\.org$"
+    );
 
     private static final Pattern uriPathPattern = Pattern.compile(
         "^(/sr(-(ec|el))?|/w(iki)?|/zh(-(cn|hans|hant|hk|mo|my|sg|tw))?)/"
@@ -175,7 +177,7 @@
             // or a 'project' domain, e.g. en.wikipedia.org
             &&  (
                     patternIsFound(uriHostWikimediaDomainPattern, uriHost)
-                    || stringContains(uriHost, uriHostWikidataString)
+                    || patternIsFound(uriHostOtherProjectsPattern, uriHost)
                     || patternIsFound(uriHostProjectDomainPattern, uriHost)
                 )
             // Either a pageview's uriPath will match the first pattern,
diff --git a/refinery-core/src/test/resources/pageview_test_data.csv 
b/refinery-core/src/test/resources/pageview_test_data.csv
index 9ba352e..fb99fd3 100644
--- a/refinery-core/src/test/resources/pageview_test_data.csv
+++ b/refinery-core/src/test/resources/pageview_test_data.csv
@@ -14,6 +14,7 @@
 Is Pageview – Desktop - Chinese zh-sg, 
true,false,174.62.175.93,-,zh.wikipedia.org,/zh-sg/Wikipedia:首页,-,200,text/html,Five-test
 plan
 Is Pageview – Desktop - Chinese zh-tw, 
true,false,174.62.175.94,-,zh.wikipedia.org,/zh-tw/Wikipedia:首页,-,200,text/html,Five-test
 plan
 Is Pageview – Wikidata, true, 
true,174.62.175.94,-,www.wikidata.org,/wiki/Q5651758,-,200,text/html,Five-test 
plan
+Is Pageview – MediaWiki, true, 
true,174.62.175.94,-,www.mediawiki.org,/wiki/Gerrit/git-review,-,200,text/html,Five-test
 plan
 Is Not Pageview - http_status != 200, false,true,174.62.175.95,-, 
en.wikipedia.org, /wiki/Noppperrrrs,-,400,text/html ,turnip
 Is Not Pageview - content_type does not match, false,true,174.62.175.96,-, 
en.wikipedia.org, /wiki/Noppperrrrs,-,200, image/png, turnip
 Is Not Pageview - API stupidity: it outputs a 200 status code and text/html as 
a MIME type on certain classes of error., false, false,174.62.175.97,-, 
en.wikipedia.org, /w/api.php,-,200, text/html, turnip

-- 
To view, visit https://gerrit.wikimedia.org/r/193982
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a80cb75f175f715f0a369879ee2851217d0ffd1
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery/source
Gerrit-Branch: master
Gerrit-Owner: OliverKeyes <oke...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to