Bawolff moved this task from Scheduled to Done on the Security-Reviews board.
Bawolff added a comment.

Security review of Cognate extension (As of a187d934b8c6)

Normal issues

  • Use sha256 (or some other modern hash) instead of md5. Even in contexts where the attacks against it is not important we should be phasing out md5.

Issues of unclear severity

64-bits is not enough for collision resistance against a malicious adversary. If I have my math right (It is very possible that I mixed this up) There are 5 million (~2^22) titles on enwiktionary. 2^64/2^22 = 2^42. Based on googling, a 4 GPU system can do roughly 2^33 hashes/second. 2^42/2^33 = 2^11 seconds ~ half an hour to find a collision. So a vandal could make malicious page titles that collide with titles on enwiktionary, for the purpose of vandalism via the lang links. I'm not sure how serious an issue this is. Its clearly a lot more work for less disruption than most forms of vandalism, but it could be confusing to users who wouldn't understand what's going on.

I think it may be an acceptable risk, but users should be consulted about the possibility before the extension is enabled. The risk should also be documented.

Alternatively the risk could be prevented by using autoincrement ids (would require storing both the normalized and raw title though) or by increasing the size of the hash to 128 bits (e.g. Having two BIG INT fields and joining on both)

Minor issues

  • line 68 of populateCognatePages.php Its preferred to use the MW DB abstraction layer instead of hand constructing sql where possible (instead of "page_namespace IN " . ..., do "page_namespace" => $namespaces).

Non-security issues

  • CacheUpdateJob doesn't update page_touched. More generally shouldn't this use built-in HTMLCacheUpdateJob for forward compatibility with any future changes to how caching works in MediaWiki? Also this would allow core's deduplication code to be used.
    • At one point I thought that this would be unnecessary, since the language links from this extension are generated on every (non-varnish cached) view (as opposed to stored in parser cache). However it actually is necessary since MW will give 304 not modified responses based on page_touched.
  • This assumes that all sites have same interwiki structure for language links or things will break. This is fine since its true on wiktionary, however I think this requirement should be clearly documented.
  • The indexe on cognatePages for the queries from selectLinkDetailsForPage, selectSitesForPage seem to be not quite right. I think the index should be (cpga_title, cpga_namespace, cpga_sites).
  • I assume this is not using the LinkUpdates related hooks because it only wants to deal with new pages not edits (?) However it seems that this still triggers on all edits. Perhaps it should look at the EDIT_NEW flag in the onPageSaveComplete hook to avoid some unnecessary database interaction.

TASK DETAIL
https://phabricator.wikimedia.org/T149082

WORKBOARD
https://phabricator.wikimedia.org/project/board/944/

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bawolff
Cc: Bawolff, daniel, Aklapper, Addshore, Lydia_Pintscher, D3r1ck01, Andrew-WMDE, dpatrick, Izno, Luke081515, Wikidata-bugs, aude, JanZerebecki, Darkdadaah, csteipp, Mbch331, Jay8g, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to