Hi all,

tl;dr: we'd like to remove the rev_is_revert field from the
mediawiki.revision-create stream to solve a missing event problem.

For years now, we've known that the mediawiki.revision-create stream
<https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_revision_create>
has
been missing many real revision create events
<https://phabricator.wikimedia.org/T215001> when compared with
MediaWiki's MySQL databases.  This makes the stream almost useless for
those who want to use it as a notification mechanism about all MediaWiki
page changes.

The reason for the large number of missing events is because the code that
emits the event is subscribing to the wrong MediaWiki hook.  This patch
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/679353/> will
fix this, however the correct hook does not give us the information we need
to set the  rev_is_revert and rev_revert_details fields.  This field is
relatively new (only added last August 2020
<https://github.com/wikimedia/schemas-event-primary/commit/53b6480cb1045316ce7bf16987e6169fa386450f#diff-70a054c62940bbabcef7a38e58eb4bf4d9001ed46dd6277473509e5775ec5d34R53-R94>).
We think that including the missing revisions is more important than
capturing the revert information, which really only captures whether or not
a user used the MediaWiki UI to issue a revert.

We plan on moving forward with this, but would like feedback before we do.
If you have objections, or other ideas on how we can provide this data
(like maybe including it in mediawiki/revision-tags-change
<https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/revision/tags-change/current.yaml>
and
making that public?), let us know by replying to this email or in this
ticket: https://phabricator.wikimedia.org/T215001

Thanks!
-Andrew Otto
 SRE, Data Engineering, WMF
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to