OliverKeyes has uploaded a new change for review.
https://gerrit.wikimedia.org/r/258197
Change subject: Add spider detection to WDQS data collection code
......................................................................
Add spider detection to WDQS data collection code
What it says on the tin.
Bug: T121106
Change-Id: If36023abff3486532a6989157aff94ca1c075b0b
---
M wdqs/basic_usage.R
1 file changed, 5 insertions(+), 3 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/wikimedia/discovery/golden
refs/changes/97/258197/1
diff --git a/wdqs/basic_usage.R b/wdqs/basic_usage.R
index 41b2988..cd56174 100644
--- a/wdqs/basic_usage.R
+++ b/wdqs/basic_usage.R
@@ -21,23 +21,25 @@
query <- paste0("USE wmf;
SELECT year, month, day, uri_path,
UPPER(http_status IN('200','304')) as success,
+ CASE WHEN agent_type = 'spider' THEN 'TRUE' ELSE 'FALSE'
END AS is_automata,
COUNT(*) AS n
FROM webrequest",
subquery,
"AND webrequest_source = 'misc'
AND uri_host = 'query.wikidata.org'
AND uri_path IN('/', '/bigdata/namespace/wdq/sparql')
- GROUP BY year, month, day, uri_path,
- UPPER(http_status IN('200','304'));")
+ GROUP BY year, month, day, uri_path, UPPER(http_status
IN('200','304')),
+ CASE WHEN agent_type = 'spider' THEN 'TRUE' ELSE 'FALSE'
END;")
results <- query_hive(query)
output <- data.frame(date = as.Date(paste(results$year, results$month,
results$day, sep = "-")),
path = results$uri_path,
http_success = results$success,
+ is_automata = results$is_automata,
events = results$n,
stringsAsFactors = FALSE)
# Write out
- conditional_write(output, file.path(base_path, "wdqs_aggregates.tsv"))
+ conditional_write(output, file.path(base_path, "wdqs_aggregates_new.tsv"))
}
--
To view, visit https://gerrit.wikimedia.org/r/258197
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: If36023abff3486532a6989157aff94ca1c075b0b
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/discovery/golden
Gerrit-Branch: master
Gerrit-Owner: OliverKeyes <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits